The icefall peoject contains speech related recipes for various datasets using k2-fsa and lhotse.
You can use sherpa, sherpa-ncnn or sherpa-onnx for deployment with models in icefall; these frameworks also support models not included in icefall; please refer to respective documents for more details.
You can try pre-trained models from within your browser without the need to download or install anything by visiting this huggingface space. Please refer to document for more details.
Please refer to document for installation.
Please refer to document for more details.
More datasets will be added in the future.
The LibriSpeech recipe supports the most comprehensive set of models, you are welcome to try them out.
- TDNN LSTM CTC
- Conformer CTC
- Zipformer CTC
- Conformer MMI
- Zipformer MMI
- Conformer-based Encoder
- LSTM-based Encoder
- Zipformer-based Encoder
- LSTM-based Predictor
- Stateless Predictor
If you are willing to contribute to icefall, please refer to contributing for more details.
We would like to highlight the performance of some of the recipes here.
This is the simplest ASR recipe in icefall
and can be run on CPU.
Training takes less than 30 seconds and gives you the following WER:
[test_set] %WER 0.42% [1 / 240, 0 ins, 1 del, 0 sub ]
We provide a Colab notebook for this recipe:
Please see RESULTS.md for the latest results.
test-clean | test-other | |
---|---|---|
WER | 2.42 | 5.73 |
We provide a Colab notebook to test the pre-trained model:
test-clean | test-other | |
---|---|---|
WER | 6.59 | 17.69 |
We provide a Colab notebook to test the pre-trained model:
test-clean | test-other | |
---|---|---|
greedy search | 3.07 | 7.51 |
We provide a Colab notebook to test the pre-trained model:
test-clean | test-other | |
---|---|---|
modified_beam_search (beam_size=4 ) |
2.56 | 6.27 |
We provide a Colab notebook to run test the pre-trained model:
WER (modified_beam_search beam_size=4
unless further stated)
- LibriSpeech-960hr
Encoder | Params | test-clean | test-other | epochs | devices |
---|---|---|---|---|---|
Zipformer | 65.5M | 2.21 | 4.79 | 50 | 4 32G-V100 |
Zipformer-small | 23.2M | 2.42 | 5.73 | 50 | 2 32G-V100 |
Zipformer-large | 148.4M | 2.06 | 4.63 | 50 | 4 32G-V100 |
Zipformer-large | 148.4M | 2.00 | 4.38 | 174 | 8 80G-A100 |
- LibriSpeech-960hr + GigaSpeech
Encoder | Params | test-clean | test-other |
---|---|---|---|
Zipformer | 65.5M | 1.78 | 4.08 |
- LibriSpeech-960hr + GigaSpeech + CommonVoice
Encoder | Params | test-clean | test-other |
---|---|---|---|
Zipformer | 65.5M | 1.90 | 3.98 |
Dev | Test | |
---|---|---|
WER | 10.47 | 10.58 |
Conformer Encoder + Stateless Predictor + k2 Pruned RNN-T Loss
Dev | Test | |
---|---|---|
greedy_search | 10.51 | 10.73 |
fast_beam_search | 10.50 | 10.69 |
modified_beam_search | 10.40 | 10.51 |
Dev | Test | |
---|---|---|
greedy_search | 10.31 | 10.50 |
fast_beam_search | 10.26 | 10.48 |
modified_beam_search | 10.25 | 10.38 |
test | |
---|---|
CER | 10.16 |
We provide a Colab notebook to test the pre-trained model:
test | |
---|---|
CER | 4.38 |
We provide a Colab notebook to test the pre-trained model:
WER (modified_beam_search beam_size=4
)
Encoder | Params | dev | test | epochs |
---|---|---|---|---|
Zipformer | 73.4M | 4.13 | 4.40 | 55 |
Zipformer-small | 30.2M | 4.40 | 4.67 | 55 |
Zipformer-large | 157.3M | 4.03 | 4.28 | 56 |
1 Trained with all subsets:
test | |
---|---|
CER | 29.08 |
We provide a Colab notebook to test the pre-trained model:
TEST | |
---|---|
PER | 19.71% |
We provide a Colab notebook to test the pre-trained model:
TEST | |
---|---|
PER | 17.66% |
We provide a Colab notebook to test the pre-trained model:
dev | test | |
---|---|---|
modified_beam_search (beam_size=4 ) |
6.91 | 6.33 |
We provide a Colab notebook to test the pre-trained model:
dev | test | |
---|---|---|
modified_beam_search (beam_size=4 ) |
6.77 | 6.14 |
We provide a Colab notebook to test the pre-trained model:
Dev | Test | |
---|---|---|
greedy_search | 5.53 | 6.59 |
fast_beam_search | 5.30 | 6.34 |
modified_beam_search | 5.27 | 6.33 |
We provide a Colab notebook to test the pre-trained model:
Dev | Test-Net | Test-Meeting | |
---|---|---|---|
greedy_search | 7.80 | 8.75 | 13.49 |
fast_beam_search | 7.94 | 8.74 | 13.80 |
modified_beam_search | 7.76 | 8.71 | 13.41 |
We provide a Colab notebook to run the pre-trained model:
Dev | Test-Net | Test-Meeting | |
---|---|---|---|
greedy_search | 8.78 | 10.12 | 16.16 |
fast_beam_search | 9.01 | 10.47 | 16.28 |
modified_beam_search | 8.53 | 9.95 | 15.81 |
Eval | Test-Net | |
---|---|---|
greedy_search | 31.77 | 34.66 |
fast_beam_search | 31.39 | 33.02 |
modified_beam_search | 30.38 | 34.25 |
We provide a Colab notebook to test the pre-trained model:
The best results for Chinese CER(%) and English WER(%) respectively (zh: Chinese, en: English):
decoding-method | dev | dev_zh | dev_en | test | test_zh | test_en |
---|---|---|---|---|---|---|
greedy_search | 7.30 | 6.48 | 19.19 | 7.39 | 6.66 | 19.13 |
fast_beam_search | 7.18 | 6.39 | 18.90 | 7.27 | 6.55 | 18.77 |
modified_beam_search | 7.15 | 6.35 | 18.95 | 7.22 | 6.50 | 18.70 |
We provide a Colab notebook to test the pre-trained model:
Once you have trained a model in icefall, you may want to deploy it with C++ without Python dependencies.
Please refer to the document for how to do this.
We also provide a Colab notebook, showing you how to run a torch scripted model in k2 with C++. Please see: