Home

Awesome

FreeVC: Towards High-Quality Text-Free One-Shot Voice Conversion

arXiv githubio GitHub Repo stars GitHub

In this paper, we adopt the end-to-end framework of VITS for high-quality waveform reconstruction, and propose strategies for clean content information extraction without text annotation. We disentangle content information by imposing an information bottleneck to WavLM features, and propose the spectrogram-resize based data augmentation to improve the purity of extracted content information.

🤗 Play online at HuggingFace Spaces.

Visit our demo page for audio samples.

We also provide the pretrained models.

<table style="width:100%"> <tr> <td><img src="./resources/train.png" alt="training" height="200"></td> <td><img src="./resources/infer.png" alt="inference" height="200"></td> </tr> <tr> <th>(a) Training</th> <th>(b) Inference</th> </tr> </table>

Updates

Pre-requisites

  1. Clone this repo: git clone https://github.com/OlaWod/FreeVC.git

  2. CD into this repo: cd FreeVC

  3. Install python requirements: pip install -r requirements.txt

  4. Download WavLM-Large and put it under directory 'wavlm/'

  5. Download the VCTK dataset (for training only)

  6. Download HiFi-GAN model and put it under directory 'hifigan/' (for training with SR only)

Inference Example

Download the pretrained checkpoints and run:

# inference with FreeVC
CUDA_VISIBLE_DEVICES=0 python convert.py --hpfile logs/freevc.json --ptfile checkpoints/freevc.pth --txtpath convert.txt --outdir outputs/freevc

# inference with FreeVC-s
CUDA_VISIBLE_DEVICES=0 python convert.py --hpfile logs/freevc-s.json --ptfile checkpoints/freevc-s.pth --txtpath convert.txt --outdir outputs/freevc-s

Training Example

  1. Preprocess
python downsample.py --in_dir </path/to/VCTK/wavs>
ln -s dataset/vctk-16k DUMMY

# run this if you want a different train-val-test split
python preprocess_flist.py

# run this if you want to use pretrained speaker encoder
CUDA_VISIBLE_DEVICES=0 python preprocess_spk.py

# run this if you want to train without SR-based augmentation
CUDA_VISIBLE_DEVICES=0 python preprocess_ssl.py

# run these if you want to train with SR-based augmentation
CUDA_VISIBLE_DEVICES=1 python preprocess_sr.py --min 68 --max 72
CUDA_VISIBLE_DEVICES=1 python preprocess_sr.py --min 73 --max 76
CUDA_VISIBLE_DEVICES=2 python preprocess_sr.py --min 77 --max 80
CUDA_VISIBLE_DEVICES=2 python preprocess_sr.py --min 81 --max 84
CUDA_VISIBLE_DEVICES=3 python preprocess_sr.py --min 85 --max 88
CUDA_VISIBLE_DEVICES=3 python preprocess_sr.py --min 89 --max 92
  1. Train
# train freevc
CUDA_VISIBLE_DEVICES=0 python train.py -c configs/freevc.json -m freevc

# train freevc-s
CUDA_VISIBLE_DEVICES=2 python train.py -c configs/freevc-s.json -m freevc-s

References