


Eesen is to simplify the existing complicated, expertise-intensive ASR pipeline into a straightforward sequence learning problem. Acoustic modeling in Eesen involves training a single recurrent neural network (RNN) to model the mapping from speech to text. Eesen abandons the following elements required by the existing ASR pipeline:

Eesen was created by Yajie Miao with inspiration from the Kaldi toolkit. Thank you, Yajie!

Key Components

Eesen contains 4 key components to enable end-to-end ASR:

Highlights of Eesen

Experimental Results

Refer to RESULTS under each example setup.


For more information, please refer to the following paper(s):

Yajie Miao, Mohammad Gowayyed, and Florian Metze, "EESEN: End-to-End Speech Recognition using Deep RNN Models and WFST-based Decoding," in Proc. Automatic Speech Recognition and Understanding Workshop (ASRU), Scottsdale, AZ; U.S.A., December 2015. IEEE.