


A Python library and CLI for generating audio samples using Harmonai Dance Diffusion models.

🚧 This project is early in development. Expect breaking changes! 🚧




conda can be installed through Anaconda or Miniconda. To run on an Apple Silicon device, you will need to use a conda installation that includes Apple Silicon support, such as Miniforge.

Cloning the repo

Clone the repo and cd into it:

git clone https://github.com/sudosilico/sample-diffusion
cd sample-diffusion

Setting up the conda environment

Create the conda environment:

# If you're not running on an Apple Silicon machine:
conda env create -f environment.yml

# For Apple Silicon machines:
conda env create -f environment-mac.yml

This may take a few minutes as it will install all the necessary Python dependencies so that they will be available to the CLI script.

Note: You must activate the dd conda environment after creating it. You can do this by running conda activate dd in the terminal. You will need to do this every time you open a new terminal window. Learn more about conda environments.

conda activate dd

Using the cli.py CLI

Generating samples

Make a models folder and place your model in models/DD/model.ckpt, then run the generator:

python cli.py

Alternatively, you can pass a custom model path as an argument instead of using the models/DD/model.ckpt default path:

python cli.py --model models/DD/some-other-model.ckpt

Your audio samples will then be in one of the following folders:

cli.py Command Line Arguments

--argsfilestrNonePath to JSON file containing cli args. If used, other passed cli args are ignored.
--use_autocastboolTrueUse autocast.
--crop_offsetint0The starting sample offset to crop input audio to. Use -1 for random cropping.
--device_acceleratorstrNoneDevice of execution.
--device_offloadstrcpuDevice to store models when not in use.
--modelstrmodels/dd/model.ckptPath to the model checkpoint file to be used (default: models/dd/model.ckpt).
--sample_rateint48000The samplerate the model was trained on.
--chunk_sizeint65536The native chunk size of the model.
--modeRequestTypeGenerationThe mode of operation (Generation, Variation, Interpolation, Inpainting or Extension).
--seedint-1 (Random)The seed used for reproducable outputs. Leave empty for random seed.
--batch_sizeint1The maximal number of samples to be produced per batch.
--audio_sourcestrNonePath to the audio source.
--audio_targetstrNonePath to the audio target (used for interpolations).
--maskstrNonePath to the mask tensor (used for inpainting).
--noise_levelfloat0.7The noise level used for variations & interpolations.
--interpolations_linearint1The number of interpolations, even spacing.
--interpolationsfloat or float[]NoneThe interpolation positions.
--keep_startboolTrueKeep beginning of audio provided(only applies to mode Extension).
--tameboolTrueDecrease output by 3db, then clip.
--stepsint50The number of steps for the sampler.
--samplerSamplerTypeIPLMSThe sampler used for the diffusion model.
--sampler_argsJson String{}Additional arguments of the DD sampler.
--scheduleSchedulerTypeCrashScheduleThe schedule used for the diffusion model.
--schedule_argsJson String{}Additional arguments of the DD schedule.
--inpainting_argsJson String{}Additional arguments for inpainting (currently unsupported)

Using args.json

Instead of specifying all the necessary arguments each time we encourage you to try using the args.json file provided with this library:

python cli.py --argsfile 'args.json'

To change any settings you can edit the args.json file.

Using the model trimming script

scripts/trim_model.py can be used to reduce the file size of Dance Diffusion models by removing data that is only needed for training and not inference. For our first models, this reduced the model size by about 75% (from 3.46 GB to 0.87 GB).

To use it, simply pass the path to the model you want to trim as an argument:

python scripts/trim_model.py models/model.ckpt

This will create a new model file at models/model_trim.ckpt.