Home

Awesome

<div align="center"> <img src="https://raw.githubusercontent.com/k2-fsa/icefall/master/docs/source/_static/logo.png" width=168> </div>

Introduction

The icefall project contains speech-related recipes for various datasets using k2-fsa and lhotse.

You can use sherpa, sherpa-ncnn or sherpa-onnx for deployment with models in icefall; these frameworks also support models not included in icefall; please refer to respective documents for more details.

You can try pre-trained models from within your browser without the need to download or install anything by visiting this huggingface space. Please refer to document for more details.

Installation

Please refer to document for installation.

Recipes

Please refer to document for more details.

ASR: Automatic Speech Recognition

Supported Datasets

More datasets will be added in the future.

Supported Models

The LibriSpeech recipe supports the most comprehensive set of models, you are welcome to try them out.

CTC

MMI

Transducer

Whisper

If you are willing to contribute to icefall, please refer to contributing for more details.

We would like to highlight the performance of some of the recipes here.

yesno

This is the simplest ASR recipe in icefall and can be run on CPU. Training takes less than 30 seconds and gives you the following WER:

[test_set] %WER 0.42% [1 / 240, 0 ins, 1 del, 0 sub ]

We provide a Colab notebook for this recipe: Open In Colab

LibriSpeech

Please see RESULTS.md for the latest results.

Conformer CTC

test-cleantest-other
WER2.425.73

We provide a Colab notebook to test the pre-trained model: Open In Colab

TDNN LSTM CTC

test-cleantest-other
WER6.5917.69

We provide a Colab notebook to test the pre-trained model: Open In Colab

Transducer (Conformer Encoder + LSTM Predictor)

test-cleantest-other
greedy_search3.077.51

We provide a Colab notebook to test the pre-trained model: Open In Colab

Transducer (Conformer Encoder + Stateless Predictor)

test-cleantest-other
modified_beam_search (beam_size=4)2.566.27

We provide a Colab notebook to test the pre-trained model: Open In Colab

Transducer (Zipformer Encoder + Stateless Predictor)

WER (modified_beam_search beam_size=4 unless further stated)

  1. LibriSpeech-960hr
EncoderParamstest-cleantest-otherepochsdevices
Zipformer65.5M2.214.79504 32G-V100
Zipformer-small23.2M2.425.73502 32G-V100
Zipformer-large148.4M2.064.63504 32G-V100
Zipformer-large148.4M2.004.381748 80G-A100
  1. LibriSpeech-960hr + GigaSpeech
EncoderParamstest-cleantest-other
Zipformer65.5M1.784.08
  1. LibriSpeech-960hr + GigaSpeech + CommonVoice
EncoderParamstest-cleantest-other
Zipformer65.5M1.903.98

GigaSpeech

Conformer CTC

DevTest
WER10.4710.58

Transducer (pruned_transducer_stateless2)

Conformer Encoder + Stateless Predictor + k2 Pruned RNN-T Loss

DevTest
greedy_search10.5110.73
fast_beam_search10.5010.69
modified_beam_search10.4010.51

Transducer (Zipformer Encoder + Stateless Predictor)

DevTest
greedy_search10.3110.50
fast_beam_search10.2610.48
modified_beam_search10.2510.38

Aishell

TDNN LSTM CTC

test
CER10.16

We provide a Colab notebook to test the pre-trained model: Open In Colab

Transducer (Conformer Encoder + Stateless Predictor)

test
CER4.38

We provide a Colab notebook to test the pre-trained model: Open In Colab

Transducer (Zipformer Encoder + Stateless Predictor)

WER (modified_beam_search beam_size=4)

EncoderParamsdevtestepochs
Zipformer73.4M4.134.4055
Zipformer-small30.2M4.404.6755
Zipformer-large157.3M4.034.2856

Aishell4

Transducer (pruned_transducer_stateless5)

1 Trained with all subsets:

test
CER29.08

We provide a Colab notebook to test the pre-trained model: Open In Colab

TIMIT

TDNN LSTM CTC

TEST
PER19.71%

We provide a Colab notebook to test the pre-trained model: Open In Colab

TDNN LiGRU CTC

TEST
PER17.66%

We provide a Colab notebook to test the pre-trained model: Open In Colab

TED-LIUM3

Transducer (Conformer Encoder + Stateless Predictor)

devtest
modified_beam_search (beam_size=4)6.916.33

We provide a Colab notebook to test the pre-trained model: Open In Colab

Transducer (pruned_transducer_stateless)

devtest
modified_beam_search (beam_size=4)6.776.14

We provide a Colab notebook to test the pre-trained model: Open In Colab

Aidatatang_200zh

Transducer (pruned_transducer_stateless2)

DevTest
greedy_search5.536.59
fast_beam_search5.306.34
modified_beam_search5.276.33

We provide a Colab notebook to test the pre-trained model: Open In Colab

WenetSpeech

Transducer (pruned_transducer_stateless2)

DevTest-NetTest-Meeting
greedy_search7.808.7513.49
fast_beam_search7.948.7413.80
modified_beam_search7.768.7113.41

We provide a Colab notebook to test the pre-trained model: Open In Colab

Transducer Streaming (pruned_transducer_stateless5)

DevTest-NetTest-Meeting
greedy_search8.7810.1216.16
fast_beam_search9.0110.4716.28
modified_beam_search8.539.9515.81

Alimeeting

Transducer (pruned_transducer_stateless2)

EvalTest-Net
greedy_search31.7734.66
fast_beam_search31.3933.02
modified_beam_search30.3834.25

We provide a Colab notebook to test the pre-trained model: Open In Colab

TAL_CSASR

Transducer (pruned_transducer_stateless5)

The best results for Chinese CER(%) and English WER(%) respectively (zh: Chinese, en: English):

decoding-methoddevdev_zhdev_entesttest_zhtest_en
greedy_search7.306.4819.197.396.6619.13
fast_beam_search7.186.3918.907.276.5518.77
modified_beam_search7.156.3518.957.226.5018.70

We provide a Colab notebook to test the pre-trained model: Open In Colab

TTS: Text-to-Speech

Supported Datasets

Supported Models

Deployment with C++

Once you have trained a model in icefall, you may want to deploy it with C++ without Python dependencies.

Please refer to

for how to do this.

We also provide a Colab notebook, showing you how to run a torch scripted model in k2 with C++. Please see: Open In Colab