Awesome
TransferTTS (Zero-shot VITS) - PyTorch Implementation (-Ongoing-)
Note!!(09.23.)
In current, this is just a implementation of zero-shot system; Not the implementation of the first contribution of the paper: Transfer learning framework using wav2vec2.0. As the future work, the model equipped with complete implementations of the two contributions (zero-shot and transfer-learning) will be implemented in the follwoing repository. Congratulations on being awarded the best paper in INTERSPEECH 2022.
Overview
Unofficial PyTorch Implementation of Transfer Learning Framework for Low-Resource Text-to-Speech using a Large-Scale Unlabeled Speech Corpus. Most of codes are based on VITS
- MelStyleEncoder from StyleSpeech is used instead of the reference encoder.
- Implementation of untranscribed data training is omitted.
- LibriTTS dataset (train-clean-100 and train-clean-360) is used. Sampling rate is set to 22050Hz.
Pre-requisites (from VITS)
- Python >= 3.6
- Clone this repository
- Install python requirements. Please refer requirements.txt
- You may need to install espeak first:
apt-get install espeak
- You may need to install espeak first:
- Build Monotonic Alignment Search and run preprocessing if you use your own datasets.
# Cython-version Monotonoic Alignment Search
cd monotonic_align
python setup.py build_ext --inplace
Preprocessing
Run
python prepare_wav.py --data_path [LibriTTS DATAPATH]
for some preparations.
Training
Train your model with
python train_ms.py -c configs/libritts.json -m libritts_base
Inference
python inference.py --ref_audio [REF AUDIO PATH] --text [INPUT TEXT]