Home

Awesome

Maintenance GitHub GitHub GitHub

Audio and Speech Pre-trained Models

NLP logo

What is pre-trained Model?

A pre-trained model is a model created by some one else to solve a similar problem. Instead of building a model from scratch to solve a similar problem, we can use the model trained on other problem as a starting point. A pre-trained model may not be 100% accurate in your application.

Other Pre-trained Models

Framework

Model visualization

You can see visualizations of each model's network architecture by using Netron.

NLP logo

Tensorflow <a name="tensorflow"/>

Model NameDescriptionFramework
WavenetThis is a TensorFlow implementation of the WaveNet generative neural network architecture for audio generation.Tensorflow
Lip ReadingCross Audio-Visual Recognition using 3D Architectures in TensorFlowTensorflow
MusicGenreClassificationAcademic research in the field of Deep Learning (Deep Neural Networks) and Sound Processing, Tel Aviv University.Tensorflow
AudiosetModels and supporting code for use with AudioSet.Tensorflow
DeepSpeechAutomatic speech recognition.Tensorflow
<div align="right"> <b><a href="#framework">↥ Back To Top</a></b> </div>

Keras <a name="keras"/>

Model NameDescriptionFramework
Ultrasound nerve segmentationThis tutorial shows how to use Keras library to build deep neural network for ultrasound image nerve segmentation.Keras
<div align="right"> <b><a href="#framework">↥ Back To Top</a></b> </div>

PyTorch <a name="pytorch"/>

Model NameDescriptionFramework
espnetEnd-to-End Speech Processing Toolkit espnet.github.io/espnetPyTorch
TTSDeep learning for Text2SpeechPyTorch
Neural Sequence labeling modelSequence labeling models are quite popular in many NLP tasks, such as Named Entity Recognition (NER), part-of-speech (POS) tagging and word segmentation.PyTorch
waveglowA Flow-based Generative Network for Speech Synthesis.PyTorch
deepvoice3_pytorchPyTorch implementation of convolutional networks-based text-to-speech synthesis models.PyTorch
deepspeech2Implementation of DeepSpeech2 using Baidu Warp-CTC. Creates a network based on the DeepSpeech2 architecture, trained with the CTC activation function.PyTorch
loopA method to generate speech across multiple speakers.PyTorch
audioSimple audio I/O for pytorch.PyTorch
speechPyTorch ASR Implementation.PyTorch
samplernn-pytorchPyTorch implementation of SampleRNN: An Unconditional End-to-End Neural Audio Generation Model.PyTorch
torch_waveglowA PyTorch implementation of the WaveGlow: A Flow-based Generative Network for Speech Synthesis.PyTorch
<div align="right"> <b><a href="#framework">↥ Back To Top</a></b> </div>

MXNet <a name="mxnet"/>

Model NameDescriptionFramework
deepspeechThis example based on DeepSpeech2 of Baidu helps you to build Speech-To-Text (STT) models at scale usingMXNet
mxnet-audioImplementation of music genre classification, audio-to-vec, song recommender, and music search in mxnet.MXNet
<div align="right"> <b><a href="#framework">↥ Back To Top</a></b> </div>

Caffe <a name="caffe"/>

Model NameDescriptionFramework
Speech RecognitionSpeech Recognition with the caffe deep learning framework.Caffe
<div align="right"> <b><a href="#framework">↥ Back To Top</a></b> </div>

Contributions

Your contributions are always welcome!! Please have a look at contributing.md

License

MIT License