Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Zipformer recipe for GigaSpeech #1254

Merged
merged 16 commits into from
Oct 21, 2023
94 changes: 94 additions & 0 deletions .github/scripts/run-gigaspeech-zipformer-2023-10-17.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,94 @@
#!/usr/bin/env bash

set -e

log() {
# This function is from espnet
local fname=${BASH_SOURCE[1]##*/}
echo -e "$(date '+%Y-%m-%d %H:%M:%S') (${fname}:${BASH_LINENO[0]}:${FUNCNAME[1]}) $*"
}

cd egs/gigaspeech/ASR

repo_url=https://huggingface.co/yfyeung/icefall-asr-gigaspeech-zipformer-2023-10-17

log "Downloading pre-trained model from $repo_url"
git lfs install
GIT_LFS_SKIP_SMUDGE=1 git clone $repo_url
repo=$(basename $repo_url)

log "Display test files"
tree $repo/
ls -lh $repo/test_wavs/*.wav

pushd $repo/exp
git lfs pull --include "data/lang_bpe_500/bpe.model"
git lfs pull --include "data/lang_bpe_500/tokens.txt"
git lfs pull --include "exp/jit_script.pt"
git lfs pull --include "exp/pretrained.pt"
ln -s pretrained.pt epoch-99.pt
ls -lh *.pt
popd

log "Export to torchscript model"
./zipformer/export.py \
--exp-dir $repo/exp \
--use-averaged-model false \
--tokens $repo/data/lang_bpe_500/tokens.txt \
--epoch 99 \
--avg 1 \
--jit 1

ls -lh $repo/exp/*.pt

log "Decode with models exported by torch.jit.script()"

./zipformer/jit_pretrained.py \
--tokens $repo/data/lang_bpe_500/tokens.txt \
--nn-model-filename $repo/exp/jit_script.pt \
$repo/test_wavs/1089-134686-0001.wav \
$repo/test_wavs/1221-135766-0001.wav \
$repo/test_wavs/1221-135766-0002.wav

for method in greedy_search modified_beam_search fast_beam_search; do
log "$method"

./zipformer/pretrained.py \
--method $method \
--beam-size 4 \
--checkpoint $repo/exp/pretrained.pt \
--tokens $repo/data/lang_bpe_500/tokens.txt \
$repo/test_wavs/1089-134686-0001.wav \
$repo/test_wavs/1221-135766-0001.wav \
$repo/test_wavs/1221-135766-0002.wav
done

echo "GITHUB_EVENT_NAME: ${GITHUB_EVENT_NAME}"
echo "GITHUB_EVENT_LABEL_NAME: ${GITHUB_EVENT_LABEL_NAME}"
if [[ x"${GITHUB_EVENT_NAME}" == x"schedule" || x"${GITHUB_EVENT_LABEL_NAME}" == x"run-decode" ]]; then
mkdir -p zipformer/exp
ln -s $PWD/$repo/exp/pretrained.pt zipformer/exp/epoch-999.pt
ln -s $PWD/$repo/data/lang_bpe_500 data/

ls -lh data
ls -lh zipformer/exp

log "Decoding test-clean and test-other"

# use a small value for decoding with CPU
max_duration=100

for method in greedy_search fast_beam_search modified_beam_search; do
log "Decoding with $method"

./zipformer/decode.py \
--decoding-method $method \
--epoch 999 \
--avg 1 \
--use-averaged-model 0 \
--max-duration $max_duration \
--exp-dir zipformer/exp
done

rm zipformer/exp/*.pt
fi
126 changes: 126 additions & 0 deletions .github/workflows/run-gigaspeech-zipformer-2023-10-17.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,126 @@
# Copyright 2022 Fangjun Kuang ([email protected])

# See ../../LICENSE for clarification regarding multiple authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

name: run-gigaspeech-zipformer-2023-10-17
# zipformer

on:
push:
branches:
- master
pull_request:
types: [labeled]

schedule:
# minute (0-59)
# hour (0-23)
# day of the month (1-31)
# month (1-12)
# day of the week (0-6)
# nightly build at 15:50 UTC time every day
- cron: "50 15 * * *"

concurrency:
group: run_gigaspeech_2023_10_17_zipformer-${{ github.ref }}
cancel-in-progress: true

jobs:
run_gigaspeech_2023_10_17_zipformer:
if: github.event.label.name == 'zipformer' ||github.event.label.name == 'ready' || github.event.label.name == 'run-decode' || github.event_name == 'push' || github.event_name == 'schedule'
runs-on: ${{ matrix.os }}
strategy:
matrix:
os: [ubuntu-latest]
python-version: [3.8]

fail-fast: false

steps:
- uses: actions/checkout@v2
with:
fetch-depth: 0

- name: Setup Python ${{ matrix.python-version }}
uses: actions/setup-python@v2
with:
python-version: ${{ matrix.python-version }}
cache: 'pip'
cache-dependency-path: '**/requirements-ci.txt'

- name: Install Python dependencies
run: |
grep -v '^#' ./requirements-ci.txt | xargs -n 1 -L 1 pip install
pip uninstall -y protobuf
pip install --no-binary protobuf protobuf==3.20.*

- name: Cache kaldifeat
id: my-cache
uses: actions/cache@v2
with:
path: |
~/tmp/kaldifeat
key: cache-tmp-${{ matrix.python-version }}-2023-05-22

- name: Install kaldifeat
if: steps.my-cache.outputs.cache-hit != 'true'
shell: bash
run: |
.github/scripts/install-kaldifeat.sh

- name: Inference with pre-trained model
shell: bash
env:
GITHUB_EVENT_NAME: ${{ github.event_name }}
GITHUB_EVENT_LABEL_NAME: ${{ github.event.label.name }}
run: |
mkdir -p egs/gigaspeech/ASR/data
ln -sfv ~/tmp/fbank-libri egs/gigaspeech/ASR/data/fbank
ls -lh egs/gigaspeech/ASR/data/*

sudo apt-get -qq install git-lfs tree
export PYTHONPATH=$PWD:$PYTHONPATH
export PYTHONPATH=~/tmp/kaldifeat/kaldifeat/python:$PYTHONPATH
export PYTHONPATH=~/tmp/kaldifeat/build/lib:$PYTHONPATH

.github/scripts/run-gigaspeech-zipformer-2023-10-17.sh

- name: Display decoding results for gigaspeech zipformer
if: github.event_name == 'schedule' || github.event.label.name == 'run-decode'
shell: bash
run: |
cd egs/gigaspeech/ASR/
tree ./zipformer/exp

cd zipformer
echo "results for zipformer"
echo "===greedy search==="
find exp/greedy_search -name "log-*" -exec grep -n --color "best for test-clean" {} + | sort -n -k2
find exp/greedy_search -name "log-*" -exec grep -n --color "best for test-other" {} + | sort -n -k2

echo "===fast_beam_search==="
find exp/fast_beam_search -name "log-*" -exec grep -n --color "best for test-clean" {} + | sort -n -k2
find exp/fast_beam_search -name "log-*" -exec grep -n --color "best for test-other" {} + | sort -n -k2

echo "===modified beam search==="
find exp/modified_beam_search -name "log-*" -exec grep -n --color "best for test-clean" {} + | sort -n -k2
find exp/modified_beam_search -name "log-*" -exec grep -n --color "best for test-other" {} + | sort -n -k2

- name: Upload decoding results for gigaspeech zipformer
uses: actions/upload-artifact@v2
if: github.event_name == 'schedule' || github.event.label.name == 'run-decode'
with:
name: torch-${{ matrix.torch }}-python-${{ matrix.python-version }}-ubuntu-latest-cpu-zipformer-2022-11-11
path: egs/gigaspeech/ASR/zipformer/exp/
16 changes: 14 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -148,8 +148,11 @@ in the decoding.

### GigaSpeech

We provide two models for this recipe: [Conformer CTC model][GigaSpeech_conformer_ctc]
and [Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss][GigaSpeech_pruned_transducer_stateless2].
We provide three models for this recipe:

- [Conformer CTC model][GigaSpeech_conformer_ctc]
- [Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned RNN-T loss][GigaSpeech_pruned_transducer_stateless2].
- [Transducer: Zipformer encoder + Embedding decoder][GigaSpeech_zipformer]

#### Conformer CTC

Expand All @@ -165,6 +168,14 @@ and [Pruned stateless RNN-T: Conformer encoder + Embedding decoder + k2 pruned R
| fast beam search | 10.50 | 10.69 |
| modified beam search | 10.40 | 10.51 |

#### Transducer: Zipformer encoder + Embedding decoder

| | Dev | Test |
|----------------------|-------|-------|
| greedy search | 10.31 | 10.50 |
| fast beam search | 10.26 | 10.48 |
| modified beam search | 10.25 | 10.38 |


### Aishell

Expand Down Expand Up @@ -378,6 +389,7 @@ Please see: [![Open In Colab](https://colab.research.google.com/assets/colab-bad
[TED-LIUM3_pruned_transducer_stateless]: egs/tedlium3/ASR/pruned_transducer_stateless
[GigaSpeech_conformer_ctc]: egs/gigaspeech/ASR/conformer_ctc
[GigaSpeech_pruned_transducer_stateless2]: egs/gigaspeech/ASR/pruned_transducer_stateless2
[GigaSpeech_zipformer]: egs/gigaspeech/ASR/zipformer
[Aidatatang_200zh_pruned_transducer_stateless2]: egs/aidatatang_200zh/ASR/pruned_transducer_stateless2
[WenetSpeech_pruned_transducer_stateless2]: egs/wenetspeech/ASR/pruned_transducer_stateless2
[WenetSpeech_pruned_transducer_stateless5]: egs/wenetspeech/ASR/pruned_transducer_stateless5
Expand Down
1 change: 1 addition & 0 deletions egs/gigaspeech/ASR/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ ln -sfv /path/to/GigaSpeech download/GigaSpeech
## Performance Record
| | Dev | Test |
|--------------------------------|-------|-------|
| `zipformer` | 10.25 | 10.38 |
| `conformer_ctc` | 10.47 | 10.58 |
| `pruned_transducer_stateless2` | 10.40 | 10.51 |

Expand Down
74 changes: 74 additions & 0 deletions egs/gigaspeech/ASR/RESULTS.md
Original file line number Diff line number Diff line change
@@ -1,4 +1,78 @@
## Results
### zipformer (zipformer + pruned stateless transducer)

See <https://github.com/k2-fsa/icefall/pull/1254> for more details.

[zipformer](./zipformer)

- Non-streaming
- normal-scaled model, number of model parameters: 65549011, i.e., 65.55 M

You can find a pretrained model, training logs, decoding logs, and decoding results at:
<https://huggingface.co/yfyeung/icefall-asr-gigaspeech-zipformer-2023-10-17>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you upload the tensorboard log to
https://wandb.ai/site
and post a link to it here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By the way, the link has not been posted. please leave a message when you think this PR is ready to merge.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I have posted that.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add a CI test for your model?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I have added.


The tensorboard log for training is available at
<https://wandb.ai/yifanyeung/icefall-asr-gigaspeech-zipformer-2023-10-20>

You can use <https://github.com/k2-fsa/sherpa> to deploy it.

| decoding method | test-clean | test-other | comment |
|----------------------|------------|------------|--------------------|
| greedy_search | 10.31 | 10.50 | --epoch 30 --avg 9 |
| modified_beam_search | 10.25 | 10.38 | --epoch 30 --avg 9 |
| fast_beam_search | 10.26 | 10.48 | --epoch 30 --avg 9 |

The training command is:
```bash
export CUDA_VISIBLE_DEVICES="0,1,2,3"
./zipformer/train.py \
--world-size 4 \
--num-epochs 30 \
--start-epoch 1 \
--use-fp16 1 \
--exp-dir zipformer/exp \
--causal 0 \
--subset XL \
--max-duration 700 \
--use-transducer 1 \
--use-ctc 0 \
--lr-epochs 1 \
--master-port 12345
```

The decoding command is:
```bash
export CUDA_VISIBLE_DEVICES=0

# greedy search
./zipformer/decode.py \
--epoch 30 \
--avg 9 \
--exp-dir ./zipformer/exp \
--max-duration 1000 \
--decoding-method greedy_search

# modified beam search
./zipformer/decode.py \
--epoch 30 \
--avg 9 \
--exp-dir ./zipformer/exp \
--max-duration 1000 \
--decoding-method modified_beam_search \
--beam-size 4

# fast beam search (one best)
./zipformer/decode.py \
--epoch 30 \
--avg 9 \
--exp-dir ./zipformer/exp \
--max-duration 1000 \
--decoding-method fast_beam_search \
--beam 20.0 \
--max-contexts 8 \
--max-states 64
```

### GigaSpeech BPE training results (Pruned Transducer 2)

#### 2022-05-12
Expand Down
Loading
Loading