The WERs are
dev | test | comment | |
---|---|---|---|
greedy search | 15.52 | 15.28 | --epoch 18, --avg 5, --max-duration 200 |
modified beam search | 13.88 | 13.7 | --epoch 18, --avg 5, --max-duration 200 |
fast beam search | 14.62 | 14.36 | --epoch 18, --avg 5, --max-duration 200 |
The training command for reproducing is given below:
export CUDA_VISIBLE_DEVICES="0,1,2,3"
./pruned_transducer_stateless5/train.py \
--world-size 4 \
--num-epochs 30 \
--start-epoch 1 \
--exp-dir pruned_transducer_stateless5/exp \
--max-duration 300 \
--num-buckets 50
The tensorboard training log can be found at https://tensorboard.dev/experiment/YyNv45pfQ0GqWzZ898WOlw/#scalars
The decoding command is:
epoch=18
avg=5
for method in greedy_search modified_beam_search fast_beam_search; do
./pruned_transducer_stateless5/decode.py \
--epoch $epoch \
--beam-size 10 \
--avg $avg \
--exp-dir ./pruned_transducer_stateless5/exp \
--max-duration 200 \
--decoding-method $method \
--max-sym-per-frame 1 \
--num-encoder-layers 12 \
--dim-feedforward 2048 \
--nhead 8 \
--encoder-dim 512 \
--decoder-dim 512 \
--joiner-dim 512 \
--use-averaged-model True
done
You can find a pretrained model, training logs, decoding logs, and decoding results at: https://huggingface.co/AmirHussein/icefall-asr-mgb2-conformer_ctc-2022-27-06
The best WER, as of 2022-06-04, for the MGB2 test dataset is below
Using whole lattice HLG decoding + n-gram LM rescoring
dev | test | |
---|---|---|
WER | 15.62 | 15.01 |
Scale values used in n-gram LM rescoring and attention rescoring for the best WERs are:
ngram_lm_scale | attention_scale |
---|---|
0.1 | - |
Using n-best (n=0.5) attention decoder rescoring
dev | test | |
---|---|---|
WER | 15.89 | 15.08 |
Scale values used in n-gram LM rescoring and attention rescoring for the best WERs are:
ngram_lm_scale | attention_scale |
---|---|
0.01 | 0.5 |
To reproduce the above result, use the following commands for training:
cd egs/mgb2/ASR
. ./path.sh
./prepare.sh
export CUDA_VISIBLE_DEVICES="0,1"
./conformer_ctc/train.py \
--lang-dir data/lang_bpe_5000 \
--att-rate 0.8 \
--lr-factor 10 \
--max-duration \
--concatenate-cuts 0 \
--world-size 2 \
--bucketing-sampler 1 \
--max-duration 100 \
--start-epoch 0 \
--num-epochs 40
and the following command for nbest decoding
./conformer_ctc/decode.py \
--lang-dir data/lang_bpe_5000 \
--max-duration 30 \
--concatenate-cuts 0 \
--bucketing-sampler 1 \
--num-paths 1000 \
--epoch 40 \
--avg 5 \
--method attention-decoder \
--nbest-scale 0.5
and the following command for whole-lattice decoding
./conformer_ctc/decode.py \
--epoch 40 \
--avg 5 \
--exp-dir conformer_ctc/exp_5000_att0.8 \
--lang-dir data/lang_bpe_5000 \
--max-duration 30 \
--concatenate-cuts 0 \
--bucketing-sampler 1 \
--num-paths 1000 \
--method whole-lattice-rescoring
The tensorboard log for training is available at https://tensorboard.dev/experiment/QYNzOi52RwOX8yvtpl3hMw/#scalars
The best WER, as of 2022-06-04, for the MGB2 test dataset is below
Using whole lattice HLG decoding + n-gram LM rescoring
dev | test | |
---|---|---|
WER | 25.32 | 23.53 |
Scale values used in n-gram LM rescoring and attention rescoring for the best WERs are:
ngram_lm_scale | attention_scale |
---|---|
0.1 | - |
Using n-best (n=0.5) HLG decoding + n-gram LM rescoring + attention decoder rescoring:
dev | test | |
---|---|---|
WER | 27.87 | 26.12 |
Scale values used in n-gram LM rescoring and attention rescoring for the best WERs are:
ngram_lm_scale | attention_scale |
---|---|
0.01 | 0.3 |
To reproduce the above result, use the following commands for training:
cd egs/mgb2/ASR
. ./path.sh
./prepare.sh
export CUDA_VISIBLE_DEVICES="0,1"
./conformer_ctc/train.py \
--lang-dir data/lang_bpe_5000 \
--att-rate 0.8 \
--lr-factor 10 \
--max-duration \
--concatenate-cuts 0 \
--world-size 2 \
--bucketing-sampler 1 \
--max-duration 100 \
--start-epoch 0 \
--num-epochs 40
and the following command for nbest decoding
./conformer_ctc/decode.py \
--lang-dir data/lang_bpe_5000 \
--max-duration 30 \
--concatenate-cuts 0 \
--bucketing-sampler 1 \
--num-paths 1000 \
--epoch 40 \
--avg 5 \
--method attention-decoder \
--nbest-scale 0.5
and the following command for whole-lattice decoding
./conformer_ctc/decode.py \
--lang-dir data/lang_bpe_5000 \
--max-duration 30 \
--concatenate-cuts 0 \
--bucketing-sampler 1 \
--num-paths 1000 \
--epoch 40 \
--avg 5 \
--method whole-lattice-rescoring
The tensorboard log for training is available at https://tensorboard.dev/experiment/zy6FnumCQlmiO7BPsdCmEg/#scalars