Awesome
sample-diffusion
A Python library and CLI for generating audio samples using Harmonai Dance Diffusion models.
🚧 This project is early in development. Expect breaking changes! 🚧
Features
- A CLI for generating audio samples from the command line using Dance Diffusion models. (
cli.py
) - A script for reducing the file size of Dance Diffusion models by removing data that is only needed for training and not inference. (
scripts/trim_model.py
)
Installation
Requirements
conda
can be installed through Anaconda or Miniconda. To run on an Apple Silicon device, you will need to use a conda installation that includes Apple Silicon support, such as Miniforge.
Cloning the repo
Clone the repo and cd
into it:
git clone https://github.com/sudosilico/sample-diffusion
cd sample-diffusion
Setting up the conda environment
Create the conda
environment:
# If you're not running on an Apple Silicon machine:
conda env create -f environment.yml
# For Apple Silicon machines:
conda env create -f environment-mac.yml
This may take a few minutes as it will install all the necessary Python dependencies so that they will be available to the CLI script.
Note: You must activate the
dd
conda environment after creating it. You can do this by runningconda activate dd
in the terminal. You will need to do this every time you open a new terminal window. Learn more about conda environments.
conda activate dd
Using the cli.py
CLI
Generating samples
Make a models
folder and place your model in models/DD/model.ckpt
, then run the generator:
python cli.py
Alternatively, you can pass a custom model path as an argument instead of using the models/DD/model.ckpt
default path:
python cli.py --model models/DD/some-other-model.ckpt
Your audio samples will then be in one of the following folders:
audio/Output/DD/{mode}/{seed}_{steps}
cli.py
Command Line Arguments
argument | type | default | desc |
---|---|---|---|
--argsfile | str | None | Path to JSON file containing cli args. If used, other passed cli args are ignored. |
--use_autocast | bool | True | Use autocast. |
--crop_offset | int | 0 | The starting sample offset to crop input audio to. Use -1 for random cropping. |
--device_accelerator | str | None | Device of execution. |
--device_offload | str | cpu | Device to store models when not in use. |
--model | str | models/dd/model.ckpt | Path to the model checkpoint file to be used (default: models/dd/model.ckpt). |
--sample_rate | int | 48000 | The samplerate the model was trained on. |
--chunk_size | int | 65536 | The native chunk size of the model. |
--mode | RequestType | Generation | The mode of operation (Generation, Variation, Interpolation, Inpainting or Extension). |
--seed | int | -1 (Random) | The seed used for reproducable outputs. Leave empty for random seed. |
--batch_size | int | 1 | The maximal number of samples to be produced per batch. |
--audio_source | str | None | Path to the audio source. |
--audio_target | str | None | Path to the audio target (used for interpolations). |
--mask | str | None | Path to the mask tensor (used for inpainting). |
--noise_level | float | 0.7 | The noise level used for variations & interpolations. |
--interpolations_linear | int | 1 | The number of interpolations, even spacing. |
--interpolations | float or float[] | None | The interpolation positions. |
--keep_start | bool | True | Keep beginning of audio provided(only applies to mode Extension). |
--tame | bool | True | Decrease output by 3db, then clip. |
--steps | int | 50 | The number of steps for the sampler. |
--sampler | SamplerType | IPLMS | The sampler used for the diffusion model. |
--sampler_args | Json String | {} | Additional arguments of the DD sampler. |
--schedule | SchedulerType | CrashSchedule | The schedule used for the diffusion model. |
--schedule_args | Json String | {} | Additional arguments of the DD schedule. |
--inpainting_args | Json String | {} | Additional arguments for inpainting (currently unsupported) |
Using args.json
Instead of specifying all the necessary arguments each time we encourage you to try using the args.json file provided with this library:
python cli.py --argsfile 'args.json'
To change any settings you can edit the args.json file.
Using the model trimming script
scripts/trim_model.py
can be used to reduce the file size of Dance Diffusion models by removing data that is only needed for training and not inference. For our first models, this reduced the model size by about 75% (from 3.46 GB to 0.87 GB).
To use it, simply pass the path to the model you want to trim as an argument:
python scripts/trim_model.py models/model.ckpt
This will create a new model file at models/model_trim.ckpt
.