Awesome
Pytorch Implementation of DC-TTS for Emotional TTS
This fork is modified to work for transfer learning for low-resource emotional TTS, as described here.
Training
- Install the dependencies using
pip install -r requirements.txt
- Preprocess the EmoV-DB dataset using
process_emovdb.py
- Change the
logdir
argument inhyperparams.py
. Other parameters can be edits optionally. DO NOT edit these hyperparameters. - Add the path to the pre-trained Text2Mel model in the logdir
- Comment this line if you are not running the
train-text2mel.py
file for the first time. - Run the training script like -
python train-text2mel.py --dataset=emovdb
Synthesis
- Write the sentences that you want to generate here
- Add the checkpoint for the fine-tuned Text2Mel model in place of this line
- Edit the paths for the output.
- Run the synthesis script like -
python synthesize.py -- dataset=emovdb
Readme of the original repository
PyTorch implementation of Efficiently Trainable Text-to-Speech System Based on Deep Convolutional Networks with Guided Attention based partially on the following projects:
- https://github.com/Kyubyong/dc_tts (audio pre processing)
- https://github.com/r9y9/deepvoice3_pytorch (data loader sampler)
Online Text-To-Speech Demo
The following notebooks are executable on https://colab.research.google.com :
For audio samples and pretrained models, visit the above notebook links.
Training/Synthesizing English Text-To-Speech
The English TTS uses the LJ-Speech dataset.
- Download the dataset:
python dl_and_preprop_dataset.py --dataset=ljspeech
- Train the Text2Mel model:
python train-text2mel.py --dataset=ljspeech
- Train the SSRN model:
python train-ssrn.py --dataset=ljspeech
- Synthesize sentences:
python synthesize.py --dataset=ljspeech
- The WAV files are saved in the
samples
folder.
- The WAV files are saved in the
Training/Synthesizing Mongolian Text-To-Speech
The Mongolian text-to-speech uses 5 hours audio from the Mongolian Bible.
- Download the dataset:
python dl_and_preprop_dataset.py --dataset=mbspeech
- Train the Text2Mel model:
python train-text2mel.py --dataset=mbspeech
- Train the SSRN model:
python train-ssrn.py --dataset=mbspeech
- Synthesize sentences:
python synthesize.py --dataset=mbspeech
- The WAV files are saved in the
samples
folder.
- The WAV files are saved in the