Awesome
SERAB: Speech Emotion Recognition Adaptation Benchmark
This repo contains a "simplified" implementation of SERAB, which includes:
- BYOL-A training and utility functions (Original repo: https://github.com/nttcslab/byol-a)
- BYOL-A and transformer-inspired models
- Kudos to Phil Wang for his implementation of CvT (https://github.com/lucidrains/vit-pytorch)
- Benchmark tests for SERAB
- TFDS scripts to load SERAB data
Update: BYOL-S was one of the strongest submissions of the HEAR NeurIPS 2021 Challenge! Leaderboard results: https://neuralaudio.ai/hear2021-results.html
Demo
Environment setup
Libraries to reproduce the environment are detailed in serab.yml
.
To reproduce the environment, run:
conda env create -f serab.yml
To install the external source files from patches, copy the following after cloning the repo:
cd SERAB/
curl -O https://raw.githubusercontent.com/nttcslab/byol-a/f2451c366d02be031a31967f494afdf3485a85ff/config.yaml
patch --ignore-whitespace < config.diff
curl -O https://raw.githubusercontent.com/nttcslab/byol-a/f2451c366d02be031a31967f494afdf3485a85ff/train.py
patch < train.diff
cd byol_a/
curl -O https://raw.githubusercontent.com/nttcslab/byol-a/f2451c366d02be031a31967f494afdf3485a85ff/byol_a/augmentations.py
patch < augmentations.diff
curl -O https://raw.githubusercontent.com/nttcslab/byol-a/f2451c366d02be031a31967f494afdf3485a85ff/byol_a/common.py
patch < common.diff
curl -O https://raw.githubusercontent.com/nttcslab/byol-a/f2451c366d02be031a31967f494afdf3485a85ff/byol_a/dataset.py
patch < dataset.diff
curl -O https://raw.githubusercontent.com/nttcslab/byol-a/f2451c366d02be031a31967f494afdf3485a85ff/byol_a/models.py
mv models.py models/audio_ntt.py
Evaluate a (pre-trained model) using SERAB
In this simplified version, only PyTorch models can be used.
Before running the evaluation, make sure that the config file config.yaml
is correctly setup for your model.
To run a pre-existing model, run:
python clf_benchmark.py --model_name {MODEL_NAME} --dataset_name {DATASET_NAME}
By default, grid-search-based classifier hyperparameter optimization is performed. To run a pre-existing model with the "default" classifiers, add the model_selection --none
key:
python clf_benchmark.py --model_name {MODEL_NAME} --dataset_name {DATASET_NAME} --model_selection none
To run a model on all the SERAB datasets, <a href="https://dvc.org/">DVC</a> can be used.
Make the appropriate modifications in dvc.yaml
and run:
dvc repro
Train a model "à la BYOL-A"
Models can be pre-trained on a subsample of AudioSet that only contains speech.
You might need to do changes in train.py
and config.yaml
before starting training.
To train a model, run:
python train.py {MODEL_NAME} # or dvc repro
As training time is usually long (10-20h depending on the model), we recommend using tmux to attach & detach terminals from a given session.
SERAB datasets
While CREMA-D and SAVEE are already integrated into TFDS, the other datasets were added as <a href="https://www.tensorflow.org/datasets/add_dataset">tensorflow datasets</a>.
The code to load these datasets can be found in tensorflow_datasets
.
Here are the steps to download and load the SERAB datasets:
- In the
tensorflow_datasets
folder, create the foldersdownload/manual
- Download the compressed datasets (.zip files) under
tensorflow_datasets/download/manual/
Link to the SERAB Datasets:
- AESDD: http://m3c.web.auth.gr/research/aesdd-speech-emotion-recognition/
- CaFE: https://zenodo.org/record/1478765
- EmoDB: http://emodb.bilderbar.info/download/
- EMOVO: http://voice.fub.it/activities/corpora/emovo/index.html
- IEM4 (restricted access): https://sail.usc.edu/iemocap/
- RAVDESS: https://smartlaboratory.org/ravdess/
- SAVEE (restricted access): http://kahlan.eps.surrey.ac.uk/savee/Download.html
- ShEMO: https://github.com/mansourehk/ShEMO
- SUBESCO: https://zenodo.org/record/4526477#.YcyUeGjMJPY
-
Ensure all samples in a given datasets are all mono or stereo! You can use
stereo_to_mono.py
in serab.utils to convert all stereo audios to mono. -
Build each dataset using the TFDS CLI:
cd tensorflow_datasets/{DATASET_NAME}
tfds build # Download and prepare the dataset to `~/tensorflow_datasets/
The datasets are now ready to use!
Citation
If you are using this code, please cite the paper:
@article{scheidwasser2021serab,
title={SERAB: A multi-lingual benchmark for speech emotion recognition},
author={Scheidwasser-Clow, Neil and Kegler, Mikolaj and Beckmann, Pierre and Cernak, Milos},
journal={arXiv preprint arXiv:2110.03414},
year={2021}
}