From df36f93bd81a61cf1bcff23ea465292b33b3a268 Mon Sep 17 00:00:00 2001 From: Xiaoyu Yang <45973641+marcoyang1998@users.noreply.github.com> Date: Wed, 24 Apr 2024 17:00:42 +0800 Subject: [PATCH] add small-scaled model for audio tagging (#1604) --- egs/audioset/AT/RESULTS.md | 51 ++++++++++++++++++++++++++++++++++++++ 1 file changed, 51 insertions(+) diff --git a/egs/audioset/AT/RESULTS.md b/egs/audioset/AT/RESULTS.md index 0c75dfe4e3..0128b70184 100644 --- a/egs/audioset/AT/RESULTS.md +++ b/egs/audioset/AT/RESULTS.md @@ -5,6 +5,8 @@ See for more details [zipformer](./zipformer) +#### normal-scaled model, number of model parameters: 65549011, i.e., 65.55 M + You can find a pretrained model, training logs, decoding logs, and decoding results at: @@ -42,3 +44,52 @@ python zipformer/evaluate.py \ --exp-dir zipformer/exp_at_as_full \ --max-duration 500 ``` + + +#### small-scaled model, number of model parameters: 22125218, i.e., 22.13 M + +You can find a pretrained model, training logs, decoding logs, and decoding results at: + + +The model achieves the following mean averaged precision on AudioSet: + +| Model | mAP | +| ------ | ------- | +| Zipformer-S-AT | 45.1 | + +The training command is: + +```bash +export CUDA_VISIBLE_DEVICES="4,5,6,7" +subset=full + +python zipformer/train.py \ + --world-size 4 \ + --num-epochs 50 \ + --exp-dir zipformer/exp_small_at_as_${subset} \ + --start-epoch 1 \ + --use-fp16 1 \ + --num-events 527 \ + --num-encoder-layers 2,2,2,2,2,2 \ + --feedforward-dim 512,768,768,768,768,768 \ + --encoder-dim 192,256,256,256,256,256 \ + --encoder-unmasked-dim 192,192,192,192,192,192 \ + --audioset-subset $subset \ + --max-duration 1200 \ + --enable-musan True \ + --master-port 13455 +``` + +The evaluation command is: + +```bash +python zipformer/evaluate.py \ + --epoch 31 \ + --avg 4 \ + --num-encoder-layers 2,2,2,2,2,2 \ + --feedforward-dim 512,768,768,768,768,768 \ + --encoder-dim 192,256,256,256,256,256 \ + --encoder-unmasked-dim 192,192,192,192,192,192 \ + --exp-dir zipformer/exp_small_at_as_full \ + --max-duration 500 +``` \ No newline at end of file