Awesome

Pytorch Implementation of DC-TTS for Emotional TTS

This fork is modified to work for transfer learning for low-resource emotional TTS, as described here.

Install the dependencies using pip install -r requirements.txt
Preprocess the EmoV-DB dataset using process_emovdb.py
Change the logdir argument in hyperparams.py. Other parameters can be edits optionally. DO NOT edit these hyperparameters.
Add the path to the pre-trained Text2Mel model in the logdir
Comment this line if you are not running the train-text2mel.py file for the first time.
Run the training script like - python train-text2mel.py --dataset=emovdb

Readme of the original repository

The following notebooks are executable on https://colab.research.google.com :

For audio samples and pretrained models, visit the above notebook links.

The English TTS uses the LJ-Speech dataset.

Download the dataset: python dl_and_preprop_dataset.py --dataset=ljspeech
Train the Text2Mel model: python train-text2mel.py --dataset=ljspeech
Train the SSRN model: python train-ssrn.py --dataset=ljspeech
Synthesize sentences: python synthesize.py --dataset=ljspeech
- The WAV files are saved in the samples folder.

The Mongolian text-to-speech uses 5 hours audio from the Mongolian Bible.

Download the dataset: python dl_and_preprop_dataset.py --dataset=mbspeech
Train the Text2Mel model: python train-text2mel.py --dataset=mbspeech
Train the SSRN model: python train-ssrn.py --dataset=mbspeech
Synthesize sentences: python synthesize.py --dataset=mbspeech
- The WAV files are saved in the samples folder.