Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Init commit for swbd #1146

Merged
merged 73 commits into from
Oct 7, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
73 commits
Select commit Hold shift + click to select a range
e53eae7
Init commit for swbd
Jun 26, 2023
35984ca
removed unsed scripts
Jun 26, 2023
96738b5
fixed formatting issues
JinZr Jun 26, 2023
439855e
Fixed formatting issues in bash scripts.
JinZr Jun 26, 2023
70c7576
Update README.md
JinZr Jun 26, 2023
abbd0d9
Update prepare.sh
JinZr Jun 26, 2023
354e409
Update decode.py
JinZr Jun 26, 2023
f2a0a10
Update RESULTS.md
JinZr Jun 26, 2023
11faddc
Update RESULTS.md
JinZr Jun 27, 2023
f85b95e
Updated decode.py to obtain WERs for subsets.
JinZr Jun 28, 2023
1f85c6a
Update README.md
JinZr Jul 7, 2023
301c354
Update prepare.sh
JinZr Jul 27, 2023
e0d06f1
added normalization for eval2000
JinZr Aug 1, 2023
11fe000
minor updates
JinZr Aug 1, 2023
57e6808
file permission changed
JinZr Aug 1, 2023
61037d7
Merge branch 'dev_swbd' of https://github.com/JinZr/icefall into dev_…
JinZr Aug 1, 2023
099e789
minor updates
JinZr Aug 1, 2023
291fdee
Merge branch 'dev_swbd' of https://github.com/JinZr/icefall into dev_…
JinZr Aug 1, 2023
6758165
minor updates
JinZr Aug 1, 2023
e38afc4
Merge branch 'dev_swbd' of https://github.com/JinZr/icefall into dev_…
JinZr Aug 1, 2023
5533c62
updated
JinZr Aug 8, 2023
e0ee8dd
minor updates
JinZr Aug 11, 2023
7671422
Merge branch 'k2-fsa:master' into dev_swbd
JinZr Aug 11, 2023
58d9088
Merge branch 'dev_swbd' of https://github.com/JinZr/icefall into dev_…
JinZr Aug 12, 2023
ab07e58
minor updates
JinZr Aug 18, 2023
afe2f2b
added gitignore
JinZr Aug 18, 2023
9d848b1
updated gitignore
JinZr Aug 18, 2023
e13b01a
updated gitignore
JinZr Aug 18, 2023
f7fac70
minor updates
JinZr Aug 19, 2023
15f6dcf
minor updates
JinZr Aug 19, 2023
9594efd
minor fixes
JinZr Aug 19, 2023
60e974f
minor updates
JinZr Aug 19, 2023
ba480b7
Merge branch 'k2-fsa:master' into dev_swbd
JinZr Aug 19, 2023
7feaa61
minor updates
JinZr Aug 23, 2023
3672d23
updated
JinZr Aug 24, 2023
d335152
minor updates
JinZr Aug 25, 2023
1715567
Update train_bpe_model.py
JinZr Sep 1, 2023
fae87b3
minor updates
JinZr Sep 4, 2023
b45d83f
minor update on text norm
JinZr Sep 4, 2023
675194a
fixed a formatting issue
JinZr Sep 4, 2023
7c44c3a
default params updated
JinZr Sep 4, 2023
93e22a9
Update RESULTS.md
JinZr Sep 4, 2023
c61b3ac
Update prepare.sh
JinZr Sep 6, 2023
098a704
Update train_bpe_model.py
JinZr Sep 12, 2023
c78aabf
Merge branch 'k2-fsa:master' into dev_swbd
JinZr Sep 13, 2023
db43f5c
Merge branch 'k2-fsa:master' into dev_swbd
JinZr Sep 13, 2023
841a153
zipformer recipe for swbd
JinZr Sep 15, 2023
4ab83db
minor fixes
JinZr Sep 15, 2023
6e6a364
Merge branch 'dev_swbd' of https://github.com/JinZr/icefall into dev_…
JinZr Sep 15, 2023
c0f2abd
minor fix
JinZr Sep 15, 2023
bd053ca
minor fix
JinZr Sep 15, 2023
c3dc8f0
minor fix
JinZr Sep 15, 2023
3ae86f1
Merge branch 'k2-fsa:master' into dev_swbd
JinZr Sep 21, 2023
1f694c5
Merge branch 'k2-fsa:master' into dev_swbd
JinZr Sep 22, 2023
2e9bc56
Merge branch 'k2-fsa:master' into dev_swbd
JinZr Sep 24, 2023
02791de
Merge branch 'k2-fsa:master' into dev_swbd
JinZr Sep 26, 2023
e13497c
removed `zipformer` recipe for now
JinZr Sep 26, 2023
5471599
final refinement
JinZr Sep 26, 2023
903e37e
replaced several scripts with symlinks
JinZr Sep 26, 2023
01b5216
removed scripts existing in other recipes
JinZr Sep 26, 2023
29a02f3
replaced with symlinks
JinZr Sep 26, 2023
56b7072
Update egs/swbd/ASR/RESULTS.md
JinZr Sep 26, 2023
d981e82
added CI test for the swbd recipe
JinZr Sep 26, 2023
71a860d
Update run-swbd-conformer-ctc-2023-08-26.sh
JinZr Sep 26, 2023
ebf0e5a
Delete pretrained.py
JinZr Sep 26, 2023
191ad9d
Create pretrained.py
JinZr Sep 26, 2023
28bcc96
Delete pretrained.py
JinZr Sep 26, 2023
95866d9
Create pretrained.py
JinZr Sep 26, 2023
078c872
fixed CI test
JinZr Sep 26, 2023
b80aae7
Update run-swbd-conformer-ctc-2023-08-26.sh
JinZr Sep 27, 2023
d977d47
removed attention-decoder to fix 143 error
JinZr Sep 27, 2023
efd9364
Update run-swbd-conformer-ctc-2023-08-26.sh
JinZr Sep 27, 2023
1710098
finalize CI test
JinZr Sep 27, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
44 changes: 44 additions & 0 deletions .github/scripts/run-swbd-conformer-ctc-2023-08-26.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,44 @@
#!/usr/bin/env bash

set -e

log() {
# This function is from espnet
local fname=${BASH_SOURCE[1]##*/}
echo -e "$(date '+%Y-%m-%d %H:%M:%S') (${fname}:${BASH_LINENO[0]}:${FUNCNAME[1]}) $*"
}

cd egs/swbd/ASR

repo_url=https://huggingface.co/zrjin/icefall-asr-swbd-conformer-ctc-2023-8-26

log "Downloading pre-trained model from $repo_url"
git lfs install
git clone $repo_url
repo=$(basename $repo_url)


log "Display test files"
tree $repo/
ls -lh $repo/test_wavs/*.wav

pushd $repo/exp
ln -s epoch-98.pt epoch-99.pt
popd

ls -lh $repo/exp/*.pt

for method in ctc-decoding 1best; do
log "$method"

./conformer_ctc/pretrained.py \
--method $method \
--checkpoint $repo/exp/epoch-99.pt \
--tokens $repo/data/lang_bpe_500/tokens.txt \
--words-file $repo/data/lang_bpe_500/words.txt \
--HLG $repo/data/lang_bpe_500/HLG.pt \
--G $repo/data/lm/G_4_gram.pt \
$repo/test_wavs/1089-134686-0001.wav \
$repo/test_wavs/1221-135766-0001.wav \
$repo/test_wavs/1221-135766-0002.wav
done
84 changes: 84 additions & 0 deletions .github/workflows/run-swbd-conformer-ctc.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
# Copyright 2023 Xiaomi Corp. (author: Zengrui Jin)

# See ../../LICENSE for clarification regarding multiple authors
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

name: run-swbd-conformer_ctc

on:
push:
branches:
- master
pull_request:
types: [labeled]

concurrency:
group: run-swbd-conformer_ctc-${{ github.ref }}
cancel-in-progress: true

jobs:
run-swbd-conformer_ctc:
if: github.event.label.name == 'onnx' || github.event.label.name == 'ready' || github.event_name == 'push' || github.event.label.name == 'swbd'
runs-on: ${{ matrix.os }}
strategy:
matrix:
os: [ubuntu-latest]
python-version: [3.8]

fail-fast: false

steps:
- uses: actions/checkout@v2
with:
fetch-depth: 0

- name: Setup Python ${{ matrix.python-version }}
uses: actions/setup-python@v2
with:
python-version: ${{ matrix.python-version }}
cache: 'pip'
cache-dependency-path: '**/requirements-ci.txt'

- name: Install Python dependencies
run: |
grep -v '^#' ./requirements-ci.txt | xargs -n 1 -L 1 pip install
pip uninstall -y protobuf
pip install --no-binary protobuf protobuf==3.20.*

- name: Cache kaldifeat
id: my-cache
uses: actions/cache@v2
with:
path: |
~/tmp/kaldifeat
key: cache-tmp-${{ matrix.python-version }}-2023-05-22

- name: Install kaldifeat
if: steps.my-cache.outputs.cache-hit != 'true'
shell: bash
run: |
.github/scripts/install-kaldifeat.sh

- name: Inference with pre-trained model
shell: bash
env:
GITHUB_EVENT_NAME: ${{ github.event_name }}
GITHUB_EVENT_LABEL_NAME: ${{ github.event.label.name }}
run: |
sudo apt-get -qq install git-lfs tree
export PYTHONPATH=$PWD:$PYTHONPATH
export PYTHONPATH=~/tmp/kaldifeat/kaldifeat/python:$PYTHONPATH
export PYTHONPATH=~/tmp/kaldifeat/build/lib:$PYTHONPATH

.github/scripts/run-swbd-conformer-ctc-2023-08-26.sh
2 changes: 2 additions & 0 deletions egs/swbd/ASR/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
switchboard_word_alignments.tar.gz
./swb_ms98_transcriptions/
25 changes: 25 additions & 0 deletions egs/swbd/ASR/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
# Switchboard

The Switchboard-1 Telephone Speech Corpus (LDC97S62) consists of approximately 260 hours of speech and was originally collected by Texas Instruments in 1990-1, under DARPA sponsorship. The first release of the corpus was published by NIST and distributed by the LDC in 1992-3. Since that release, a number of corrections have been made to the data files as presented on the original CD-ROM set and all copies of the first pressing have been distributed.

Switchboard is a collection of about 2,400 two-sided telephone conversations among 543 speakers (302 male, 241 female) from all areas of the United States. A computer-driven robot operator system handled the calls, giving the caller appropriate recorded prompts, selecting and dialing another person (the callee) to take part in a conversation, introducing a topic for discussion and recording the speech from the two subjects into separate channels until the conversation was finished. About 70 topics were provided, of which about 50 were used frequently. Selection of topics and callees was constrained so that: (1) no two speakers would converse together more than once and (2) no one spoke more than once on a given topic.

(The above introduction is from the [LDC Switchboard-1 Release 2 webpage](https://catalog.ldc.upenn.edu/LDC97S62).)


## Performance Record
| | eval2000 | rt03 |
|--------------------------------|------------|--------|
| `conformer_ctc` | 33.37 | 35.06 |

See [RESULTS](/egs/swbd/ASR/RESULTS.md) for details.

## Credit

The training script for `conformer_ctc` comes from the LibriSpeech `conformer_ctc` recipe in icefall.

A lot of the scripts for data processing are from the first-gen Kaldi and the ESPNet project, tailored by myself to incorporate with Lhotse and Icefall.

Some of the scripts for text normalization are from stale pull requests of [Piotr Żelasko](https://github.com/pzelasko) and [Nagendra Goel](https://github.com/ngoel17).

The `sclite_scoring.py` is from the GigaSpeech recipe for post processing and glm-like scoring, which is definitely not an elegant stuff to do.
113 changes: 113 additions & 0 deletions egs/swbd/ASR/RESULTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,113 @@
## Results
### Switchboard BPE training results (Conformer-CTC)

#### 2023-09-04

The best WER, as of 2023-09-04, for the Switchboard is below

Results using attention decoder are given as:

| | eval2000-swbd | eval2000-callhome | eval2000-avg |
|--------------------------------|-----------------|---------------------|--------------|
| `conformer_ctc` | 9.48 | 17.73 | 13.67 |

Decoding results and models can be found here:
https://huggingface.co/zrjin/icefall-asr-swbd-conformer-ctc-2023-8-26
JinZr marked this conversation as resolved.
Show resolved Hide resolved
JinZr marked this conversation as resolved.
Show resolved Hide resolved
#### 2023-06-27

The best WER, as of 2023-06-27, for the Switchboard is below

Results using HLG decoding + n-gram LM rescoring + attention decoder rescoring:

| | eval2000 | rt03 |
|--------------------------------|------------|--------|
| `conformer_ctc` | 30.80 | 32.29 |

Scale values used in n-gram LM rescoring and attention rescoring for the best WERs are:

##### eval2000

| ngram_lm_scale | attention_scale |
|----------------|-----------------|
| 0.9 | 1.1 |

##### rt03

| ngram_lm_scale | attention_scale |
|----------------|-----------------|
| 0.9 | 1.9 |

To reproduce the above result, use the following commands for training:

```bash
cd egs/swbd/ASR
./prepare.sh
export CUDA_VISIBLE_DEVICES="0,1"
./conformer_ctc/train.py \
--max-duration 120 \
--num-workers 8 \
--enable-musan False \
--world-size 2 \
--num-epochs 100
```

and the following command for decoding:

```bash
./conformer_ctc/decode.py \
--epoch 99 \
--avg 10 \
--max-duration 50
```

#### 2023-06-26

The best WER, as of 2023-06-26, for the Switchboard is below

Results using HLG decoding + n-gram LM rescoring + attention decoder rescoring:

| | eval2000 | rt03 |
|--------------------------------|------------|--------|
| `conformer_ctc` | 33.37 | 35.06 |

Scale values used in n-gram LM rescoring and attention rescoring for the best WERs are:

##### eval2000

| ngram_lm_scale | attention_scale |
|----------------|-----------------|
| 0.3 | 2.5 |

##### rt03

| ngram_lm_scale | attention_scale |
|----------------|-----------------|
| 0.7 | 1.3 |

To reproduce the above result, use the following commands for training:

```bash
cd egs/swbd/ASR
./prepare.sh
export CUDA_VISIBLE_DEVICES="0,1"
./conformer_ctc/train.py \
--max-duration 120 \
--num-workers 8 \
--enable-musan False \
--world-size 2 \
```

and the following command for decoding:

```bash
./conformer_ctc/decode.py \
--epoch 55 \
--avg 1 \
--max-duration 50
```

For your reference, the nbest oracle WERs are:

| | eval2000 | rt03 |
|--------------------------------|------------|--------|
| `conformer_ctc` | 25.64 | 26.84 |
Empty file.
Loading
Loading