Home

Awesome

DiariZen

DiariZen is a speaker diarization toolkit driven by AudioZen and Pyannote 3.1.

Installation

# create virtual python environment
conda create --name diarizen python=3.10
conda activate diarizen

# install diarizen 
conda install pytorch==2.1.1 torchvision==0.16.1 torchaudio==2.1.1 pytorch-cuda=12.1 -c pytorch -c nvidia
pip install -r requirements.txt && pip install -e .

# install pyannote-audio
cd pyannote-audio && pip install -e .[dev,testing]

# install dscore
git submodule init
git submodule update

Datasets

We use SDM (first channel from the first far-field microphone array) data from public AMI, AISHELL-4, and AliMeeting for model training and evaluation. Please download these datasets firstly. Our data partition is here.

Usage

Pre-trained

Results (SDM)

We aim to make the whole pipeline as simple as possible. Therefore, for the results below:

collar=0s                           
--------------------------------------------------------------
System         Features       AMI   AISHELL-4   AliMeeting         
--------------------------------------------------------------
Pyannote3       SincNet       21.1     13.9       22.8

Proposed         Fbank        19.7     12.5       21.0
              WavLM-frozen    17.0     11.7       19.9
              WavLM-updated   15.4     11.7       17.6
--------------------------------------------------------------

collar=0.25s 
--------------------------------------------------------------
System         Features       AMI   AISHELL-4   AliMeeting         
--------------------------------------------------------------
Pyannote3       SincNet       13.7     7.7       13.6

Proposed         Fbank        12.9     6.9       12.6
              WavLM-frozen    10.9     6.1       12.0
              WavLM-updated    9.8     5.9       10.2
--------------------------------------------------------------
Note:
The results above are different from our ICASSP submission. 
We made a few updates to experimental numbers but the conclusions in our paper are as same as the original ones.

Citation

If you found this work helpful, please consider citing: J. Han, F. Landini, J. Rohdin, A. Silnova, M. Diez, and L. Burget, Leveraging Self-Supervised Learning for Speaker Diarization, arXiv preprint arXiv:2409.09408, 2024.

@article{han2024leveragingselfsupervisedlearningspeaker,
      title={Leveraging Self-Supervised Learning for Speaker Diarization}, 
      author={Jiangyu Han and Federico Landini and Johan Rohdin and Anna Silnova and Mireia Diez and Lukas Burget},
      journal={arXiv preprint arXiv:2409.09408},
      year={2024}
}

License

This repository under the MIT license.

Contact

If you have any comment or question, please contact ihan@fit.vut.cz