Home

Awesome

Pytorch Implementation of DC-TTS for Emotional TTS

This fork is modified to work for transfer learning for low-resource emotional TTS, as described here.

Training

  1. Install the dependencies using pip install -r requirements.txt
  2. Preprocess the EmoV-DB dataset using process_emovdb.py
  3. Change the logdir argument in hyperparams.py. Other parameters can be edits optionally. DO NOT edit these hyperparameters.
  4. Add the path to the pre-trained Text2Mel model in the logdir
  5. Comment this line if you are not running the train-text2mel.py file for the first time.
  6. Run the training script like - python train-text2mel.py --dataset=emovdb

Synthesis

  1. Write the sentences that you want to generate here
  2. Add the checkpoint for the fine-tuned Text2Mel model in place of this line
  3. Edit the paths for the output.
  4. Run the synthesis script like - python synthesize.py -- dataset=emovdb


Readme of the original repository

PyTorch implementation of Efficiently Trainable Text-to-Speech System Based on Deep Convolutional Networks with Guided Attention based partially on the following projects:

Online Text-To-Speech Demo

The following notebooks are executable on https://colab.research.google.com :

For audio samples and pretrained models, visit the above notebook links.

Training/Synthesizing English Text-To-Speech

The English TTS uses the LJ-Speech dataset.

  1. Download the dataset: python dl_and_preprop_dataset.py --dataset=ljspeech
  2. Train the Text2Mel model: python train-text2mel.py --dataset=ljspeech
  3. Train the SSRN model: python train-ssrn.py --dataset=ljspeech
  4. Synthesize sentences: python synthesize.py --dataset=ljspeech
    • The WAV files are saved in the samples folder.

Training/Synthesizing Mongolian Text-To-Speech

The Mongolian text-to-speech uses 5 hours audio from the Mongolian Bible.

  1. Download the dataset: python dl_and_preprop_dataset.py --dataset=mbspeech
  2. Train the Text2Mel model: python train-text2mel.py --dataset=mbspeech
  3. Train the SSRN model: python train-ssrn.py --dataset=mbspeech
  4. Synthesize sentences: python synthesize.py --dataset=mbspeech
    • The WAV files are saved in the samples folder.