Home

Awesome

ContentVec: An Improved Self-Supervised Speech Representation by Disentangling Speakers

This repository provides the official PyTorch implementation of ContentVec.

This is a short video that explains the main concepts of our work. If you find this work useful and use it in your research, please consider citing our paper.

ContentVec

Cite this paper

https://proceedings.mlr.press/v162/qian22b.html

Pre-trained models

The legacy model only contains the representation module, which may be loaded using plain fairseq installation without setting up this code repo.

ModelClasses
ContentVec_legacy100download
ContentVec100download
ContentVec_legacy500download
ContentVec500download

Load a model

ckpt_path = "/path/to/the/checkpoint_best_legacy.pt"
models, cfg, task = fairseq.checkpoint_utils.load_model_ensemble_and_task([ckpt_path])
model = models[0]

For detailed feature extraction steps, please refer to Hubert.

Train a new model

Data preparation

Download the zip file consisting of the following files:

Modify the root directory in the {train,valid}.tsv waveform list files

Setup code repo

Follow steps in setup.sh to setup the code repo

Pretrain ContentVec

Use run_pretrain_single.sh to run on a single node

Use run_pretrain_multi.sh and the corresponding slurm template to run on multiple GPUs and nodes