Home

Awesome

This repository provides a collection of widely popular text-to-speech (TTS) models in TensorFlow Lite (TFLite). These models primarily come from two repositories - TTS and TensorFlowTTS. We provide end-to-end Colab Notebooks that show the model conversion and inference process using TFLite. This includes converting PyTorch models to TFLite as well.

TTS is a two-step process - first you generate a MEL spectrogram using a TTS model and then you pass it to a VOCODER for generating the audio waveform. We include both of these models inside this repository.

Note that these models are trained on LJSpeech dataset.

Here’s a sample result (with Fastspeech2 and MelGAN) for the text “Bill got in the habit of asking himself".

Models Included

In the future, we may add more models.

<small> *Currently, conversion of the Glow TTS model is unavailable (refer to the issue here). </small>

Currently, Forward Tacotron only supports ONNX Conversion. There is a problem while converting to TensorFlow Graph Format. (Refer to this issue for more details).

Notes:

About the Notebooks

Model conversion processes for Tacotron2, Fastspeech2, and Multi-Band MelGAN are available via the following notebooks:

Model Benchmarks

After converting to TFLite, we used the Benchmark tool in order to report performance metrics of the various models such as inference latency, peak memory usage. We used Redmi K20 for this purpose. For all the experiments we kept the number of threads to one and we used the CPU of Redmi K20 and no other hardware accelerator.

ModelQuantizationModel Size (MB)Average Inference Latency (sec)Memory Footprint (MB)
Parallel WaveGANDynamic-range5.70.0431.5
Parallel WaveGANFloat163.20.0534
MelGANDynamic-range170.5181
MelGANFloat168.30.5289
MB MelGANDynamic-range170.0217
HiFi-GANDynamic-range3.50.00159.88
HiFi-GANFloat162.90.003620.3
Tacotron2Dynamic-range30.11.6675
Fastspeech2Dynamic-range300.1155

Notes:

🔈 Audio Samples

All combination of samples are available in audio_samples folder. To listen directly without downloading refer to this Sound Cloud folder.

References