Awesome
Audio and Speech Pre-trained Models
What is pre-trained Model?
A pre-trained model is a model created by some one else to solve a similar problem. Instead of building a model from scratch to solve a similar problem, we can use the model trained on other problem as a starting point. A pre-trained model may not be 100% accurate in your application.
Other Pre-trained Models
Framework
Model visualization
You can see visualizations of each model's network architecture by using Netron.
Tensorflow <a name="tensorflow"/>
Model Name | Description | Framework |
---|---|---|
Wavenet | This is a TensorFlow implementation of the WaveNet generative neural network architecture for audio generation. | Tensorflow |
Lip Reading | Cross Audio-Visual Recognition using 3D Architectures in TensorFlow | Tensorflow |
MusicGenreClassification | Academic research in the field of Deep Learning (Deep Neural Networks) and Sound Processing, Tel Aviv University. | Tensorflow |
Audioset | Models and supporting code for use with AudioSet. | Tensorflow |
DeepSpeech | Automatic speech recognition. | Tensorflow |
Keras <a name="keras"/>
Model Name | Description | Framework |
---|---|---|
Ultrasound nerve segmentation | This tutorial shows how to use Keras library to build deep neural network for ultrasound image nerve segmentation. | Keras |
PyTorch <a name="pytorch"/>
Model Name | Description | Framework |
---|---|---|
espnet | End-to-End Speech Processing Toolkit espnet.github.io/espnet | PyTorch |
TTS | Deep learning for Text2Speech | PyTorch |
Neural Sequence labeling model | Sequence labeling models are quite popular in many NLP tasks, such as Named Entity Recognition (NER), part-of-speech (POS) tagging and word segmentation. | PyTorch |
waveglow | A Flow-based Generative Network for Speech Synthesis. | PyTorch |
deepvoice3_pytorch | PyTorch implementation of convolutional networks-based text-to-speech synthesis models. | PyTorch |
deepspeech2 | Implementation of DeepSpeech2 using Baidu Warp-CTC. Creates a network based on the DeepSpeech2 architecture, trained with the CTC activation function. | PyTorch |
loop | A method to generate speech across multiple speakers. | PyTorch |
audio | Simple audio I/O for pytorch. | PyTorch |
speech | PyTorch ASR Implementation. | PyTorch |
samplernn-pytorch | PyTorch implementation of SampleRNN: An Unconditional End-to-End Neural Audio Generation Model. | PyTorch |
torch_waveglow | A PyTorch implementation of the WaveGlow: A Flow-based Generative Network for Speech Synthesis. | PyTorch |
MXNet <a name="mxnet"/>
Model Name | Description | Framework |
---|---|---|
deepspeech | This example based on DeepSpeech2 of Baidu helps you to build Speech-To-Text (STT) models at scale using | MXNet |
mxnet-audio | Implementation of music genre classification, audio-to-vec, song recommender, and music search in mxnet. | MXNet |
Caffe <a name="caffe"/>
Model Name | Description | Framework |
---|---|---|
Speech Recognition | Speech Recognition with the caffe deep learning framework. | Caffe |
Contributions
Your contributions are always welcome!! Please have a look at contributing.md