Skip to content

Latest commit

 

History

History
236 lines (177 loc) · 5.75 KB

RESULTS.md

File metadata and controls

236 lines (177 loc) · 5.75 KB

Results

MGB2 all data BPE training results (Stateless Pruned Transducer)

2022-09-07

The WERs are

dev test comment
greedy search 15.52 15.28 --epoch 18, --avg 5, --max-duration 200
modified beam search 13.88 13.7 --epoch 18, --avg 5, --max-duration 200
fast beam search 14.62 14.36 --epoch 18, --avg 5, --max-duration 200

The training command for reproducing is given below:

export CUDA_VISIBLE_DEVICES="0,1,2,3"


  
./pruned_transducer_stateless5/train.py \
  --world-size 4 \
  --num-epochs 30 \
  --start-epoch 1 \
  --exp-dir pruned_transducer_stateless5/exp \
  --max-duration 300 \
  --num-buckets 50

The tensorboard training log can be found at https://tensorboard.dev/experiment/YyNv45pfQ0GqWzZ898WOlw/#scalars

The decoding command is:

epoch=18
avg=5
for method in greedy_search modified_beam_search fast_beam_search; do
  ./pruned_transducer_stateless5/decode.py \
    --epoch $epoch \
	--beam-size 10 \
    --avg $avg \
    --exp-dir ./pruned_transducer_stateless5/exp \
    --max-duration 200 \
    --decoding-method $method \
    --max-sym-per-frame 1 \
    --num-encoder-layers 12 \
    --dim-feedforward 2048 \
    --nhead 8 \
    --encoder-dim 512 \
    --decoder-dim 512 \
    --joiner-dim 512 \
    --use-averaged-model True
done

MGB2 all data BPE training results (Conformer-CTC) (after 40 epochs)

2022-06-04

You can find a pretrained model, training logs, decoding logs, and decoding results at: https://huggingface.co/AmirHussein/icefall-asr-mgb2-conformer_ctc-2022-27-06

The best WER, as of 2022-06-04, for the MGB2 test dataset is below

Using whole lattice HLG decoding + n-gram LM rescoring

dev test
WER 15.62 15.01

Scale values used in n-gram LM rescoring and attention rescoring for the best WERs are:

ngram_lm_scale attention_scale
0.1 -

Using n-best (n=0.5) attention decoder rescoring

dev test
WER 15.89 15.08

Scale values used in n-gram LM rescoring and attention rescoring for the best WERs are:

ngram_lm_scale attention_scale
0.01 0.5

To reproduce the above result, use the following commands for training:

Note: the model was trained on V-100 32GB GPU

cd egs/mgb2/ASR
. ./path.sh
./prepare.sh
export CUDA_VISIBLE_DEVICES="0,1"
./conformer_ctc/train.py \
  --lang-dir data/lang_bpe_5000 \
  --att-rate 0.8 \
  --lr-factor 10 \
  --max-duration  \
  --concatenate-cuts 0 \
  --world-size 2 \
  --bucketing-sampler 1 \
  --max-duration 100 \ 
  --start-epoch 0 \
  --num-epochs 40
  

and the following command for nbest decoding

./conformer_ctc/decode.py \
  --lang-dir data/lang_bpe_5000 \
  --max-duration 30 \
  --concatenate-cuts 0 \
  --bucketing-sampler 1 \
  --num-paths 1000 \
  --epoch 40 \
  --avg 5 \
  --method attention-decoder \
  --nbest-scale 0.5

and the following command for whole-lattice decoding

./conformer_ctc/decode.py \
  --epoch 40 \
  --avg 5 \
  --exp-dir conformer_ctc/exp_5000_att0.8 \
  --lang-dir data/lang_bpe_5000 \
  --max-duration 30 \
  --concatenate-cuts 0 \
  --bucketing-sampler 1 \
  --num-paths 1000 \
  --method  whole-lattice-rescoring

The tensorboard log for training is available at https://tensorboard.dev/experiment/QYNzOi52RwOX8yvtpl3hMw/#scalars

MGB2 100h BPE training results (Conformer-CTC) (after 33 epochs)

2022-06-04

The best WER, as of 2022-06-04, for the MGB2 test dataset is below

Using whole lattice HLG decoding + n-gram LM rescoring

dev test
WER 25.32 23.53

Scale values used in n-gram LM rescoring and attention rescoring for the best WERs are:

ngram_lm_scale attention_scale
0.1 -

Using n-best (n=0.5) HLG decoding + n-gram LM rescoring + attention decoder rescoring:

dev test
WER 27.87 26.12

Scale values used in n-gram LM rescoring and attention rescoring for the best WERs are:

ngram_lm_scale attention_scale
0.01 0.3

To reproduce the above result, use the following commands for training:

Note: the model was trained on V-100 32GB GPU

cd egs/mgb2/ASR
. ./path.sh
./prepare.sh
export CUDA_VISIBLE_DEVICES="0,1"
./conformer_ctc/train.py \
  --lang-dir data/lang_bpe_5000 \
  --att-rate 0.8 \
  --lr-factor 10 \
  --max-duration  \
  --concatenate-cuts 0 \
  --world-size 2 \
  --bucketing-sampler 1 \
  --max-duration 100 \ 
  --start-epoch 0 \
  --num-epochs 40
  

and the following command for nbest decoding

./conformer_ctc/decode.py \
  --lang-dir data/lang_bpe_5000 \
  --max-duration 30 \
  --concatenate-cuts 0 \
  --bucketing-sampler 1 \
  --num-paths 1000 \
  --epoch 40 \
  --avg 5 \
  --method attention-decoder \
  --nbest-scale 0.5

and the following command for whole-lattice decoding

./conformer_ctc/decode.py \
  --lang-dir data/lang_bpe_5000 \
  --max-duration 30 \
  --concatenate-cuts 0 \
  --bucketing-sampler 1 \
  --num-paths 1000 \
  --epoch 40 \
  --avg 5 \
  --method  whole-lattice-rescoring

The tensorboard log for training is available at https://tensorboard.dev/experiment/zy6FnumCQlmiO7BPsdCmEg/#scalars