This is a Chinese speech recognition recipe that trains on all Chinese corpora including:
Dataset | Duration (Hours) |
---|---|
Aidatatang | 140 |
Aishell | 151 |
MagicData | 712 |
Primewords | 99 |
ST-CMDS | 110 |
THCHS-30 | 26 |
TAL-ASR | 587 |
AISHELL2 | 1000 |
- Dataset: Aidatatang, Aishell, MagicData, Primewords, ST-CMDS, and THCHS-30.
- Feature info: using fbank feature, with cmvn, no speed perturb.
- Training info: lr 0.004, batch size 18, 3 machines, 3*8 = 24 GPUs, acc_grad 1, 220 epochs, dither 0.1
- Decoding info: ctc_weight 0.5, average_num 30
- Git hash: 013794572a55c7d0dbea23a66106ccf3e5d3b8d4
- Model link: http://mobvoi-speech-public.ufile.ucloud.cn/public/wenet/multi_cn/20210315_unified_transformer_exp.tar.gz
Dataset | chunk size | attention decoder | ctc greedy search | ctc prefix beam search | attention rescoring |
---|---|---|---|---|---|
Aidatatang | full | 4.23 | 5.82 | 5.82 | 4.71 |
16 | 4.59 | 6.99 | 6.99 | 5.29 | |
Aishell | full | 4.69 | 5.80 | 5.80 | 4.64 |
16 | 4.97 | 6.75 | 6.75 | 5.37 | |
MagicData | full | 2.86 | 4.01 | 4.00 | 3.07 |
16 | 3.10 | 5.02 | 5.02 | 3.68 | |
THCHS-30 | full | 16.68 | 15.46 | 15.46 | 14.38 |
16 | 17.47 | 16.81 | 16.82 | 15.63 |
- Dataset: Aidatatang, Aishell, MagicData, Primewords, ST-CMDS, and THCHS-30.
- Feature info: using fbank feature, with cmvn, speed perturb.
- Training info: lr 0.001, batch size 8, 1 machines, 1*8 = 8 GPUs, acc_grad 12, 60 epochs
- Decoding info: ctc_weight 0.5, average_num 10
- Git hash: 5bdf436e671ef4c696d1b039f29cc33109e072fa
- Model link:
Dataset | chunk size | attention decoder | ctc greedy search | ctc prefix beam search | attention rescoring |
---|---|---|---|---|---|
Aidatatang | full | 4.12 | 4.97 | 4.97 | 4.22 |
16 | 4.45 | 5.73 | 5.73 | 4.75 | |
Aishell | full | 4.49 | 5.07 | 5.05 | 4.43 |
16 | 4.77 | 5.77 | 5.77 | 4.85 | |
MagicData | full | 2.55 | 3.07 | 3.05 | 2.59 |
16 | 2.81 | 3.88 | 3.86 | 3.08 | |
THCHS-30 | full | 13.55 | 13.75 | 13.76 | 12.72 |
16 | 13.78 | 15.10 | 15.08 | 13.90 |
- Dataset: Aidatatang, Aishell, MagicData, Primewords, ST-CMDS, THCHS-30, TAL-ASR, and AISHELL2.
- Feature info: using fbank feature, dither=0, cmvn, speed perturb
- Training info: lr 0.001, batch size 22, 4 GPUs, acc_grad 4, 120 epochs, dither 0.1
- Decoding info: ctc_weight 0.5, average_num 10
- Git hash: 66f30c197d00c59fdeda3bc8ada801f867b73f78
- Model link: http://mobvoi-speech-public.ufile.ucloud.cn/public/wenet/multi_cn/20210815_unified_conformer_exp.tar.gz
Dataset | chunk size | attention decoder | ctc greedy search | ctc prefix beam search | attention rescoring |
---|---|---|---|---|---|
Aidatatang | full | 3.22 | 4.00 | 4.01 | 3.35 |
16 | 3.50 | 4.63 | 4.63 | 3.79 | |
Aishell | full | 1.23 | 2.12 | 2.13 | 1.42 |
16 | 1.33 | 2.72 | 2.72 | 1.72 | |
MagicData | full | 2.38 | 3.07 | 3.05 | 2.52 |
16 | 2.66 | 3.80 | 3.78 | 2.94 | |
THCHS-30 | full | 9.93 | 11.07 | 11.06 | 10.16 |
16 | 10.28 | 11.85 | 11.85 | 10.81 | |
AISHELL2 | full | 5.25 | 5.81 | 5.79 | 5.22 |
16 | 5.48 | 6.48 | 6.50 | 5.61 | |
TAL-ASR | full | 9.54 | 10.35 | 10.28 | 9.66 |
16 | 10.04 | 11.43 | 11.39 | 10.55 |