Home

Awesome

Transformer-TTS

<img src="png/model.png">

Requirements

Data

Pretrained Model

Attention plots

Self Attention encoder

<img src="png/attention_encoder.gif" height="200">

Self Attention decoder

<img src="png/attention_decoder.gif" height="200">

Attention encoder-decoder

<img src="png/attention.gif" height="200">

Learning curves & Alphas

<img src="png/training_loss.png"> <img src="png/alphas.png">

Experimental notes

  1. The learning rate is an important parameter for training. With initial learning rate of 0.001 and exponentially decaying doesn't work.
  2. The gradient clipping is also an important parameter for training. I clipped the gradient with norm value 1.
  3. With the stop token loss, the model did not training.
  4. It was very important to concatenate the input and context vectors in the Attention mechanism.

Generated Samples

<img src="png/mel_original.png" width="800">

File description

Training the network

Generate TTS wav file

Reference

Comments