Home

Awesome

Speech Recognition with BVLC caffe

Speech Recognition with the caffe deep learning framework

UPDATE: We are migrating to tensorflow

This project is quite fresh and only the first of three milestones is accomplished: Even now it might be useful if you just want to train a handful of commands/options (1,2,3..yes/no/cancel/...)

  1. training spoken numbers:

Sample spectrogram, That's what she said, too laid?

Sample spectrogram, Karen uttering 'zero' with 160 words per minute.

  1. training words:
  1. training speech:

Theoretical background: papers

A. Graves and N. Jaitly. Towards end-to-end speech recognition with recurrent neural networks. In ICML, 2014

O. Vinyals, S. V. Ravuri, and D. Povey. Revisiting recurrent neural networks for robust ASR. In ICASSP, 2012

Andrew Ng et al / Baidu

Hinton et al / Toronto

good old Hinton

Schmidhuber et al using new 'ClockWork-RNNs'

The book: Automatic Speech Recognition: A Deep Learning Approach (Signals and Communication Technology) Hardcover – November 11, 2014 by Dong Yu (Author) and Li Deng (Author)

Related work

Also see the Kaldi project, which seems a bit messy but already uses deep learning with LSTM Another experimental LSTM network, which works out-of-the-box: Currennt