Home

Awesome

Evolving artificial neural networks for cross-adaptive audio effects

The project will investigate methods of evaluating the musical applicability of cross-adaptive audio effects. The field of adaptive audio effects has been researched during the last 10-15 years, where analysis of various features of the audio signal is used to adaptively control parameters of audio processing of the same signal. Cross-adaptivity has been used similarly in automatic mixing algorithms. The relatively new field of signal interaction relates to the use of these techniques where features of one signal affect the processing of another in a live performance setting. As an example, the pitch tracking data of vocals used to control the reverberation time of the drums, or the noisiness measure of a guitar used to control the filtering of vocals. This also allows for complex signal interactions where features from several signals can be used to affect the processing of another signal. As these kinds of signal interactions are relatively uncharted territory, methods to evaluate various cross-coupling of features have not been formalized and as such currently left to empirical testing. The project should investigate AI methods for finding potentially useful mappings and evaluating their fitness.

Install dependencies (Ubuntu)

Assuming you have a clean Ubuntu 14.04, here's what needs to be installed:

Install dependencies (Windows)

Install Python 2.7

https://www.python.org/downloads/

Note: You need the 32-bit version, as the 64-bit version might fail to interoperate with Csound

Install Csound

http://csound.github.io/download.html

Install MultiNEAT, matplotlib and numpy

Building these dependencies from source can be difficult and time-consuming, so let's download wheel binaries instead

Install Aubio

Install Essentia Extractors (optional)

Install Sonic Annotator and the libXtract vamp plugin

Install NodeJS

https://nodejs.org/en/download/

Setup of this project (same for Windows and Ubuntu)

If you're on Windows, you might want to run the following commands in Git Bash

Usage

First, run nosetests to check if things are running smoothly.

Running an experiment

Input audio files that you use in experiments should reside in the input folder. When you run an experiment with two input files, the two audio clips should be of equal length. Furthermore, the format should be:

There are some example files in the test_audio folder. For example, copy drums.wav and noise.wav from the test_audio folder to the input folder.

Our goal in the following example experiment is to make noise.wav sound like drums.wav by running noise.wav through the "dist_lpf" audio effect. The audio effect has a set of parameters that are controlled by the output of a neural network. The experiment is all about evolving one or more neural networks that behave such that the processed version of noise.wav sounds like drums.wav

Run the command python main.py -i drums.wav noise.wav -g 30 -p 20

This will run the evolutionary algorithm for 30 generations with a population of 20. While this is running, you might want to open another command line instance and run python serve.py. This will start a server for a web client that interactively visualizes the results of the experiment as they become available. Websockets are used to keep the web client synchronized with whatever main.py has finished doing. Just visit http://localhost:8080 in your favorite browser. The web client looks somewhat like this:

Screenshot of visualization

To get information about all the parameters that main.py understands, run python main.py --help

The most important parameters:

  -i INPUT_FILES [INPUT_FILES ...], --input INPUT_FILES [INPUT_FILES ...]
                        The filename of the target sound and the filename of
                        the input sound, respectively
  -g NUM_GENERATIONS, --num-generations NUM_GENERATIONS
  -p POPULATION_SIZE, --population_size POPULATION_SIZE
  --fitness {similarity,multi-objective,hybrid,novelty,mixed}
                        similarity: Average local similarity, calculated with
                        euclidean distance between feature vectors for each
                        frame. multi-objective optimizes for a diverse
                        population that consists of various non-dominated
                        trade-offs between similarity in different features.
                        Hybrid fitness is the sum of similarity and multi-
                        objective, and gives you the best of both worlds.
                        Novelty fitness ignores the objective and optimizes
                        for novelty. Mixed fitness chooses a random fitness
                        evaluator for each generation.
  --neural-mode {a,ab,b,s,targets}
                        Mode a: target sound is neural input. Mode ab: target
                        sound and input sound is neural input. Mode b: input
                        sound is neural input. Mode s: static input, i.e. only
                        bias. Mode targets: evolve targets separately for each
                        timestep, with only static input
  --effect EFFECT_NAMES [EFFECT_NAMES ...], --effects EFFECT_NAMES [EFFECT_NAMES ...]
                        The name(s) of the sound effect(s) to use. See the
                        effects folder for options. In composite effects, use
                        "new_layer" to separate layers of parallel effects.
  --fs-neat [FS_NEAT]   Use FS-NEAT (automatic feature selection)
  --experiment-settings EXPERIMENT_SETTINGS
                        Filename of json file in the experiment_settings
                        folder. This file specifies which features to use as
                        neural input and for similarity calculations.

In the experiment_settings folder you can add your own json file where you specify which audio features to use for a) similarity calculations and b) neural input. Here's one possible configuration, as in mfcc_basic.json:

{
  "parameter_lpf_cutoff": 50,
  "similarity_channels": [
    {
      "name": "mfcc_0",
      "weight": 1.0
    },
    {
      "name": "mfcc_1",
      "weight": 0.2
    }
  ],
  "neural_input_channels": [
    "mfcc_0",
    "mfcc_0__derivative",
    "mfcc_1"
  ]
}

In this example we are using mfcc_0 and mfcc_1 for similarity calculations, and mfcc_1 is given less weight than mfcc_amp. In other words, mfcc_1 errors matter less than mfcc_amp errors. Neural input is mfcc_1, mfcc_amp and the derivative (gradient) of mfcc_amp, which is written as "mfcc_amp__derivative" in the config file. You can add the derivative of any feature by writing "{feature_name}__derivative", (replace {feature_name} with the name of the feature)

To see all the available audio features you can add in experiment_settings.json, run python list_all_features.py

When you are done with your experiment(s), run python clean.py. This will delete all files written during the experiment(s).

Data augmentation

If you have a short sound and you'd like to create variations of it, in terms of gain and playback speed, you can use data_augmentation.py. If you train neural networks on the augmented sound, they will typically generalize better to unseen sounds. Example command, assuming you have drums_short.wav in the input folder:

python data_augmentation.py -i drums_short.wav --factor 8

This will write the augmented sound drums_short.wav.augmented.wav to the input folder.

Live mode

If the only neural input is from features computed by Csound analyzer, then you can apply the evolved cross-adaptive audio effect (to unseen data/sound) in live performances. You can still use other features, such as bark bands, in the similarity measure. For example, you can use the experiment settings in csound_bark.json.

Example command for evolving the effect:

python main.py -i drums.wav noise.wav -g 50 -p 20 --experiment-settings csound_bark.json

Assuming you want to use the best individual in the last generation of the most recent experiment, run the following command:

python create_live_csd.py

This will write a file live.csd to the live_csd folder. This is a Csound code file with some inline python code with a base64 data blob containing data about the evolved neural network, amongst other things. Note that this file is built to run only in the python environment where it was created. The Csound file can be used in live mode like this:

csound live_csd/live.csd -iadc -odac

This assumes that you have a working sound card with stereo audio input. The target audio should be in the left channel and the input audio should be in the right channel.

You can also use the csd file offline, to speed up computation and write the output audio to disk:

csound live_csd/live.csd -iinput/drums_synth.wav -ooutput/synth_cross_adapted.wav

This assumes that you have a stereo audio file drums_synth.wav with drums in the left channel and synth in the right channel.

Use RAM disk

Experiments can run ~10% faster if you use a RAM disk to reduce I/O overhead. When you have a RAM disk running, set BASE_DIR in settings.py to the path of the RAM disk (f.ex. '/mnt/ramdisk' for Ubuntu or 'R:\\' for Windows) and run python prepare_ramdisk.py. The latter command will ensure that directories are present in the RAM disk and copy audio input files and the web-based visualization system. If you want to experiment with new audio input files after you ran python prepare_ramdisk.py, you can put the new audio files directly in the input folder on the RAM disk.

RAM disk setup (Ubuntu)

Assuming you have no RAM disk set up already, and you want one with 3 GB of space:

If you want to have this ramdisk also after reboot, run sudo nano /etc/fstab and add the following line:

tmpfs /mnt/ramdisk tmpfs nodev,nosuid,noexec,nodiratime,size=3072M 0 0

The ramdisk should now be mounted on startup/reboot. You can confirm this by rebooting and running df -h /mnt/ramdisk

RAM disk setup (Windows)

I haven't been able to get a performance gain by using a RAM disk on a Windows machine with an SSD, but if you want to try, you can install a program like this: https://www.softperfect.com/products/ramdisk/

Use at your own risk

Known issues