RNNLM Toolkit

by Tomas Mikolov, 2010-2012


Neural network based language models are nowdays among the most successful techniques for statistical language modeling. They can be easily applied in wide range of tasks, including automatic speech recognition and machine translation, and provide significant improvements over classic backoff n-gram models. The 'rnnlm' toolkit can be used to train, evaluate and use such models.

The goal of this toolkit is to speed up research progress in the language modeling field. First, by providing useful implementation that can demonstrate some of the principles. Second, for the empirical experiments when used in speech recognition and other applications. And finally third, by providing a strong state of the art baseline results, to which future research that aims to "beat state of the art techniques" should compare to.


rnnlm-0.1h - some older version of the toolkit







[rnnlm-0.4b](https://f25ea9ccb7d3346ce6891573d543960492b92c30.googledrive.com/ho st/0ByxdPXuxLPS5RFM5dVNvWVhTd0U/rnnlm-0.4b.tgz) - latest version of the toolkit

Basic examples - very useful for quick introduction (training, evaluation, hyperparameter selection, simple n-best list rescoring, etc.) - 35MB

Advanced examples - includes large scale experiments with speech lattices (n-best list rescoring, ...) - 235MB, by Stefan Kombrink

Slides from my presentation at Google - pdf

RNNLM is now integrated into Kaldi toolkit! Check this.

Example of data generated by 4-gram language model, by RNN model and by RNNME model (all models are trained on Broadcast news data, 400M/320M words) - check which generated sentences are easier to read!

Word projections from RNN-80 and [RNN-640](http://www.fit.vutbr.cz/~imikolov/rnnlm/word_projections-640.txt.g z) models trained on Broadcast news data + tool for computing the closest words. (extra large 1600-dimensional features from 3 models are here)

Tomas Mikolov - tmikolov@gmail.com

Stefan Kombrink - kombrink@fit.vutbr.cz


We would like to thank to all who have helped us with the development of this toolkit, either by providing advices or by testing it. Specially, thanks to Anoop Deoras, Sanjeev Khudanpur, Scott Novotney, Stefan Kombrink, Dan Povey, YongZhe Shi, Geoff Zweig.


