Awesome

Audio and Speech Pre-trained Models

What is pre-trained Model?

A pre-trained model is a model created by some one else to solve a similar problem. Instead of building a model from scratch to solve a similar problem, we can use the model trained on other problem as a starting point. A pre-trained model may not be 100% accurate in your application.

Other Pre-trained Models

Model visualization

You can see visualizations of each model's network architecture by using Netron.

NLP logo

Tensorflow <a name="tensorflow"/>

Model Name	Description	Framework
Wavenet	This is a TensorFlow implementation of the WaveNet generative neural network architecture for audio generation.	`Tensorflow`
Lip Reading	Cross Audio-Visual Recognition using 3D Architectures in TensorFlow	`Tensorflow`
MusicGenreClassification	Academic research in the field of Deep Learning (Deep Neural Networks) and Sound Processing, Tel Aviv University.	`Tensorflow`
Audioset	Models and supporting code for use with AudioSet.	`Tensorflow`
DeepSpeech	Automatic speech recognition.	`Tensorflow`

Keras <a name="keras"/>

Model Name	Description	Framework
Ultrasound nerve segmentation	This tutorial shows how to use Keras library to build deep neural network for ultrasound image nerve segmentation.	`Keras`

PyTorch <a name="pytorch"/>

Model Name	Description	Framework
espnet	End-to-End Speech Processing Toolkit espnet.github.io/espnet	`PyTorch`
TTS	Deep learning for Text2Speech	`PyTorch`
Neural Sequence labeling model	Sequence labeling models are quite popular in many NLP tasks, such as Named Entity Recognition (NER), part-of-speech (POS) tagging and word segmentation.	`PyTorch`
waveglow	A Flow-based Generative Network for Speech Synthesis.	`PyTorch`
deepvoice3_pytorch	PyTorch implementation of convolutional networks-based text-to-speech synthesis models.	`PyTorch`
deepspeech2	Implementation of DeepSpeech2 using Baidu Warp-CTC. Creates a network based on the DeepSpeech2 architecture, trained with the CTC activation function.	`PyTorch`
loop	A method to generate speech across multiple speakers.	`PyTorch`
audio	Simple audio I/O for pytorch.	`PyTorch`
speech	PyTorch ASR Implementation.	`PyTorch`
samplernn-pytorch	PyTorch implementation of SampleRNN: An Unconditional End-to-End Neural Audio Generation Model.	`PyTorch`
torch_waveglow	A PyTorch implementation of the WaveGlow: A Flow-based Generative Network for Speech Synthesis.	`PyTorch`

MXNet <a name="mxnet"/>

Model Name	Description	Framework
deepspeech	This example based on DeepSpeech2 of Baidu helps you to build Speech-To-Text (STT) models at scale using	`MXNet`
mxnet-audio	Implementation of music genre classification, audio-to-vec, song recommender, and music search in mxnet.	`MXNet`

Caffe <a name="caffe"/>

Model Name	Description	Framework
Speech Recognition	Speech Recognition with the caffe deep learning framework.	`Caffe`

Contributions

Your contributions are always welcome!! Please have a look at contributing.md

License

MIT License