Skip to content

Latest commit

 

History

History
90 lines (73 loc) · 5.68 KB

README.md

File metadata and controls

90 lines (73 loc) · 5.68 KB

Performance Record

This is a Chinese speech recognition recipe that trains on all Chinese corpora including:

Dataset Duration (Hours)
Aidatatang 140
Aishell 151
MagicData 712
Primewords 99
ST-CMDS 110
THCHS-30 26
TAL-ASR 587
AISHELL2 1000

Unified Transformer Result

Data info:

WER

Dataset chunk size attention decoder ctc greedy search ctc prefix beam search attention rescoring
Aidatatang full 4.23 5.82 5.82 4.71
16 4.59 6.99 6.99 5.29
Aishell full 4.69 5.80 5.80 4.64
16 4.97 6.75 6.75 5.37
MagicData full 2.86 4.01 4.00 3.07
16 3.10 5.02 5.02 3.68
THCHS-30 full 16.68 15.46 15.46 14.38
16 17.47 16.81 16.82 15.63

Unified Conformer Result

Data info:

  • Dataset: Aidatatang, Aishell, MagicData, Primewords, ST-CMDS, and THCHS-30.
  • Feature info: using fbank feature, with cmvn, speed perturb.
  • Training info: lr 0.001, batch size 8, 1 machines, 1*8 = 8 GPUs, acc_grad 12, 60 epochs
  • Decoding info: ctc_weight 0.5, average_num 10
  • Git hash: 5bdf436e671ef4c696d1b039f29cc33109e072fa
  • Model link:

WER

Dataset chunk size attention decoder ctc greedy search ctc prefix beam search attention rescoring
Aidatatang full 4.12 4.97 4.97 4.22
16 4.45 5.73 5.73 4.75
Aishell full 4.49 5.07 5.05 4.43
16 4.77 5.77 5.77 4.85
MagicData full 2.55 3.07 3.05 2.59
16 2.81 3.88 3.86 3.08
THCHS-30 full 13.55 13.75 13.76 12.72
16 13.78 15.10 15.08 13.90

Unified Conformer Result

Data info:

WER

Dataset chunk size attention decoder ctc greedy search ctc prefix beam search attention rescoring
Aidatatang full 3.22 4.00 4.01 3.35
16 3.50 4.63 4.63 3.79
Aishell full 1.23 2.12 2.13 1.42
16 1.33 2.72 2.72 1.72
MagicData full 2.38 3.07 3.05 2.52
16 2.66 3.80 3.78 2.94
THCHS-30 full 9.93 11.07 11.06 10.16
16 10.28 11.85 11.85 10.81
AISHELL2 full 5.25 5.81 5.79 5.22
16 5.48 6.48 6.50 5.61
TAL-ASR full 9.54 10.35 10.28 9.66
16 10.04 11.43 11.39 10.55