Awesome
DiaPer 🩲
PyTorch implementation for DiaPer: End-to-End Neural Diarization with Perceiver-Based Attractors.
Usage
Getting started
We recommend to create an anaconda environment
conda create -n DiaPer python=3.7
conda activate DiaPer
Clone the repository
git clone https://github.com/BUTSpeechFIT/DiaPer.git
Install the packages
conda install pip
pip install git+https://github.com/fnlandini/transformers
conda install numpy
conda install -c conda-forge tensorboard
pip install torch==1.10.0+cu113 torchvision==0.11.1+cu113 torchaudio==0.10.0+cu113 -f https://download.pytorch.org/whl/cu113/torch_stable.html
pip install safe_gpu
pip install yamlargparse==1.31.1
pip install scikit-learn==1.0.2
pip install decorator==5.1.1
pip install librosa==0.9.1
pip install setuptools==59.5.0
pip install h5py==3.8.0
pip install matplotlib==3.5.3
Other versions might work but these were the settings used for this work.
Run the example
./run_example.sh
If it works, you should be set.
Train
To run the training you can call:
python diaper/train.py -c examples/train.yaml
Note that in the example you need to define the train and validation data directories as well as the output directory. The rest of the parameters are standard ones, as used in our publication. For adaptation or fine-tuning, the process is similar:
python diaper/train.py -c examples/finetune_adaptedmorespeakers.yaml
In that case, you will need to provide the path where to find the trained model that you want to adapt/fine-tune.
Inference
To run the inference, you can call:
python diaper/infer.py -c examples/infer.yaml
Note that in the example you need to define the data, model and output directories.
Or, if you want to only evaluate one file:
python diaper/infer_single_file.py -c examples/infer.yaml --wav-dir <directory with wav file> --wav-name <filename without extension>
Note that in the example you need to define the model and output directories.
Inference with pre-trained models
You can also run inference using the models we share. Either with the usual approach or a single file like:
python diaper/infer_single_file.py -c examples/infer_16k_10attractors.yaml --wav-dir examples --wav-name IS1009a
for the model trained on simulated conversations (no fine-tuning) or with fine-tuning as:
python diaper/infer_single_file.py -c examples/infer_16k_10attractors_AMIheadsetFT.yaml --wav-dir examples --wav-name IS1009a
You should obtain results as in examples/IS1009a_infer_16k_10attractors.rttm
and examples/IS1009a_infer_16k_10attractors_AMIheadsetFT.rttm
respectively.
All models trained on publicly available and free data are shared inside the folder models
. Both families of models with 10 and 20 attractors are available. If you want to use any of them, modify the infer files above to suit your needs. You will need to change models_path
and epochs
(and rttms_dir
, where the output will be generated) to use the model you want.
Results
10 attractors | 10 attractors | 20 attractors | 20 attractors | VAD+VBx+OSD | |
---|---|---|---|---|---|
DER and RTTMs | without FT | with FT | without FT | with FT | --- |
AISHELL-4 | 48.21% 📁 | 41.43% 📁 | 47.86% 📁 | 31.30% 📁 | 15.84% 📁 |
AliMeeting (far) | 38.67% 📁 | 32.60% 📁 | 34.35% 📁 | 26.27% 📁 | 28.84% 📁 |
AliMeeting (near) | 28.19% 📁 | 27.82% 📁 | 23.90% 📁 | 24.44% 📁 | 22.59% 📁 |
AMI (array) | 57.07% 📁 | 49.75% 📁 | 52.29% 📁 | 50.97% 📁 | 34.61% 📁 |
AMI (headset) | 36.36% 📁 | 32.94% 📁 | 35.08% 📁 | 30.49% 📁 | 22.42% 📁 |
Callhome | 14.86% 📁 | 13.60% 📁 | -- | -- | 13.62% 📁 |
CHiME6 | 78.25% 📁 | 70.77% 📁 | 77.51% 📁 | 69.94% 📁 | 70.42% 📁 |
DIHARD 2 | 43.75% 📁 | 32.97% 📁 | 44.51% 📁 | 31.23% 📁 | 26.67% 📁 |
DIHARD 3 full | 34.21% 📁 | 24.12% 📁 | 34.82% 📁 | 22.77% 📁 | 20.28% 📁 |
DipCo | 48.26% 📁 | -- | 43.37% 📁 | -- | 49.22% 📁 |
Mixer6 | 21.03% 📁 | 13.41% 📁 | 18.51% 📁 | 10.99% 📁 | 35.60% 📁 |
MSDWild | 35.69% 📁 | 15.46% 📁 | 25.07% 📁 | 14.59% 📁 | 16.86% 📁 |
RAMC | 38.05% 📁 | 21.11% 📁 | 32.08% 📁 | 18.69% 📁 | 18.19% 📁 |
VoxConverse | 23.20% 📁 | -- | 22.10% 📁 | -- | 6.12% 📁 |
Citation
In case of using the software, referencing results or finding the repository useful in any way please cite:
@article{landini2023diaper,
title={DiaPer: End-to-End Neural Diarization with Perceiver-Based Attractors},
author={Landini, Federico and Diez, Mireia and Stafylakis, Themos and Burget, Luk{\'a}{\v{s}}},
journal={arXiv preprint arXiv:2312.04324},
year={2023}
}
If you did not use it for a publication but still found it useful, also let me know by email, I would love to know too :)
Contact
If you have comments or questions, please contact me at landini@fit.vutbr.cz