Awesome
SCHmUBERT
implementation of absorbing state diffusion model from https://github.com/samb-t/unleashing-transformers
Samples
Samples in MIDI format can be found in the samples folder. You can also explore them in your browser (open in new tab if page not found)
Installation
I run my experiments in Python 3.10, with all dependencies managed by Conda.
conda env create -f env.yml
Note that for all experiments, a soundfont-file called 'soundfont.sf2' (not included) must be located in the root-directory of the project.
Prepare Dataset
I use the Lakh MIDI Dataset to train the models. For loading, preprocessing and extracting melodies and trios from the MIDI files, I adapted the pipelines magenta implemented for their MusciVAE. To prepare the dataset run:
python prepare_data.py --root_dir=/path/to/lmd_full --target data/lakh_trio.npy --mode trio --bars 64
Train
I use visdom to log the training progress and periodically show samples.
To train the model, start visdom and run for example:
python train.py --dataset data/lakh_trio.npy --bars 64 --batch_size 64 --tracks trio --model conv_transformer
So far, I got the best results with the conv_transformer model with one 1DConvolutional layer with a width of 4.
Pay attention to the steps_per_eval
param, which is set to 10000 per default.
The evaluation step is more computationally expensive than training for 10000 steps, which is why you might want to increase this value if you do not need that many evaluations.
Evaluate
To evaluate the framewise self-similarity metric on the samples generated by a model, run:
python evaluate.py --mode unconditional|infilling|self
Sample
For sampling, I implemented hacked a rudimentary GUI using nicegui.
python sample.py --load_step 140000 --bars 64 --tracks trio --model conv_transformer
The GUI supports:
- visualizing samples (melody=red, bass=blue, drums=black), y position indicated pitch height, special pitch values: 0: pause, 1: note off, 90: mask
- adaption of sample steps (Slider in Upload Expansion area)
- diffuse from left to right ('=>') or vice versa ('<=')
- copy from left to right ('>') or vice versa, only mask values are overwritten
- sampling unconditionally (select 'A' in the central toggle to diffuse All (batch of 8) instead of the Selected sample)
- uploading midi or musicxml - pieces for conditioning
- masking whole tracks LM = Left Melody, RD = Right Drums, ....
- masking area selected with mouse (mask button at the bottom)
- playing with cursor indicating exact position in left and right visualization
Model Weights
Model weights for the Conv_Transformer EMA model trained on the Lakh-MIDI Dataset can be obtained here.
Extract the 'logs' folder to the project root, and set load_step, model, ...
accordingly (250000, conv_transformer, ...).