Home

Awesome

Music-Representation-Comparison

This is the repo with the code to conduct a comparative analysis of different audio representation models.

Reproducability

This repo using the MagnaTagATune dataset to evaluate the performance of different music representation model in the downstream task of music tagging.

Dataset

The audio files for MagnaTagATune dataset can be downloaded here. Extract the audio files to audio directory in MTT folder. The directory structure will be as shown below:

.               
├── MTT
│   ├── audios
│   │   │── 0
│   │   │── 1
│   │   │── ...
│   ├── magnatagatune.json
├── evaluate_clap.py
├── evaluate_mert.py
└── ...

We use the same split as Jukebox.

Model Evaluation

We evaluate the following music representation models in this paper:

Model Performance

The comparison of the models are shown below:

ModelMTT<sub>AUC</sub>MTT<sub>AP</sub>
ImageBind88.55%40.19%
JukeBox91.50%41.40%
OpenL389.35%42.88%
CLAP70.04%27.95%
Wav2CLIP90.15%49.12%
MERT93.91%59.57%