Performance Record

This is a Chinese speech recognition recipe that trains on all Chinese corpora including:

Unified Transformer Result

Dataset: Aidatatang, Aishell, MagicData, Primewords, ST-CMDS, and THCHS-30.
Feature info: using fbank feature, with cmvn, no speed perturb.
Training info: lr 0.004, batch size 18, 3 machines, 3*8 = 24 GPUs, acc_grad 1, 220 epochs, dither 0.1
Decoding info: ctc_weight 0.5, average_num 30
Git hash: 013794572a55c7d0dbea23a66106ccf3e5d3b8d4
Model link: http://mobvoi-speech-public.ufile.ucloud.cn/public/wenet/multi_cn/20210315_unified_transformer_exp.tar.gz

Dataset	chunk size	attention decoder	ctc greedy search	ctc prefix beam search	attention rescoring
Aidatatang	full	4.23	5.82	5.82	4.71
	16	4.59	6.99	6.99	5.29
Aishell	full	4.69	5.80	5.80	4.64
	16	4.97	6.75	6.75	5.37
MagicData	full	2.86	4.01	4.00	3.07
	16	3.10	5.02	5.02	3.68
THCHS-30	full	16.68	15.46	15.46	14.38
	16	17.47	16.81	16.82	15.63

Dataset: Aidatatang, Aishell, MagicData, Primewords, ST-CMDS, and THCHS-30.
Feature info: using fbank feature, with cmvn, speed perturb.
Training info: lr 0.001, batch size 8, 1 machines, 1*8 = 8 GPUs, acc_grad 12, 60 epochs
Decoding info: ctc_weight 0.5, average_num 10
Git hash: 5bdf436e671ef4c696d1b039f29cc33109e072fa
Model link:

Dataset	chunk size	attention decoder	ctc greedy search	ctc prefix beam search	attention rescoring
Aidatatang	full	4.12	4.97	4.97	4.22
	16	4.45	5.73	5.73	4.75
Aishell	full	4.49	5.07	5.05	4.43
	16	4.77	5.77	5.77	4.85
MagicData	full	2.55	3.07	3.05	2.59
	16	2.81	3.88	3.86	3.08
THCHS-30	full	13.55	13.75	13.76	12.72
	16	13.78	15.10	15.08	13.90

Dataset: Aidatatang, Aishell, MagicData, Primewords, ST-CMDS, THCHS-30, TAL-ASR, and AISHELL2.
Feature info: using fbank feature, dither=0, cmvn, speed perturb
Training info: lr 0.001, batch size 22, 4 GPUs, acc_grad 4, 120 epochs, dither 0.1
Decoding info: ctc_weight 0.5, average_num 10
Git hash: 66f30c197d00c59fdeda3bc8ada801f867b73f78
Model link: http://mobvoi-speech-public.ufile.ucloud.cn/public/wenet/multi_cn/20210815_unified_conformer_exp.tar.gz

Dataset	chunk size	attention decoder	ctc greedy search	ctc prefix beam search	attention rescoring
Aidatatang	full	3.22	4.00	4.01	3.35
	16	3.50	4.63	4.63	3.79
Aishell	full	1.23	2.12	2.13	1.42
	16	1.33	2.72	2.72	1.72
MagicData	full	2.38	3.07	3.05	2.52
	16	2.66	3.80	3.78	2.94
THCHS-30	full	9.93	11.07	11.06	10.16
	16	10.28	11.85	11.85	10.81
AISHELL2	full	5.25	5.81	5.79	5.22
	16	5.48	6.48	6.50	5.61
TAL-ASR	full	9.54	10.35	10.28	9.66
	16	10.04	11.43	11.39	10.55