Skip to content

Commit

Permalink
deploy: 027302c
Browse files Browse the repository at this point in the history
  • Loading branch information
marcoyang1998 committed Feb 20, 2024
1 parent c128646 commit e5fed50
Show file tree
Hide file tree
Showing 5 changed files with 20 additions and 20 deletions.
6 changes: 3 additions & 3 deletions _sources/decoding-with-langugage-models/LODR.rst.txt
Original file line number Diff line number Diff line change
Expand Up @@ -30,7 +30,7 @@ of langugae model integration.
First, let's have a look at some background information. As the predecessor of LODR, Density Ratio (DR) is first proposed `here <https://arxiv.org/abs/2002.11268>`_
to address the language information mismatch between the training
corpus (source domain) and the testing corpus (target domain). Assuming that the source domain and the test domain
are acoustically similar, DR derives the following formular for decoding with Bayes' theorem:
are acoustically similar, DR derives the following formula for decoding with Bayes' theorem:

.. math::
Expand All @@ -41,7 +41,7 @@ are acoustically similar, DR derives the following formular for decoding with Ba
where :math:`\lambda_1` and :math:`\lambda_2` are the weights of LM scores for target domain and source domain respectively.
Here, the source domain LM is trained on the training corpus. The only difference in the above formular compared to
Here, the source domain LM is trained on the training corpus. The only difference in the above formula compared to
shallow fusion is the subtraction of the source domain LM.

Some works treat the predictor and the joiner of the neural transducer as its internal LM. However, the LM is
Expand All @@ -58,7 +58,7 @@ during decoding for transducer model:
In LODR, an additional bi-gram LM estimated on the source domain (e.g training corpus) is required. Compared to DR,
the only difference lies in the choice of source domain LM. According to the original `paper <https://arxiv.org/abs/2203.16776>`_,
LODR achieves similar performance compared DR in both intra-domain and cross-domain settings.
LODR achieves similar performance compared to DR in both intra-domain and cross-domain settings.
As a bi-gram is much faster to evaluate, LODR is usually much faster.

Now, we will show you how to use LODR in ``icefall``.
Expand Down
24 changes: 12 additions & 12 deletions _sources/decoding-with-langugage-models/shallow-fusion.rst.txt
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,9 @@ to improve the word-error-rate of a transducer model.

.. note::

This tutorial is based on the recipe
This tutorial is based on the recipe
`pruned_transducer_stateless7_streaming <https://github.com/k2-fsa/icefall/tree/master/egs/librispeech/ASR/pruned_transducer_stateless7_streaming>`_,
which is a streaming transducer model trained on `LibriSpeech`_.
which is a streaming transducer model trained on `LibriSpeech`_.
However, you can easily apply shallow fusion to other recipes.
If you encounter any problems, please open an issue here `icefall <https://github.com/k2-fsa/icefall/issues>`_.

Expand Down Expand Up @@ -69,11 +69,11 @@ Training a language model usually takes a long time, we can download a pre-train
.. code-block:: bash
$ # download the external LM
$ GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/ezerhouni/icefall-librispeech-rnn-lm
$ GIT_LFS_SKIP_SMUDGE=1 git clone https://huggingface.co/ezerhouni/icefall-librispeech-rnn-lm
$ # create a symbolic link so that the checkpoint can be loaded
$ pushd icefall-librispeech-rnn-lm/exp
$ git lfs pull --include "pretrained.pt"
$ ln -s pretrained.pt epoch-99.pt
$ ln -s pretrained.pt epoch-99.pt
$ popd
.. note::
Expand All @@ -85,7 +85,7 @@ Training a language model usually takes a long time, we can download a pre-train
To use shallow fusion for decoding, we can execute the following command:

.. code-block:: bash
$ exp_dir=./icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29/exp
$ lm_dir=./icefall-librispeech-rnn-lm/exp
$ lm_scale=0.29
Expand Down Expand Up @@ -133,16 +133,16 @@ The decoding result obtained with the above command are shown below.
$ For test-other, WER of different settings are:
$ beam_size_4 7.08 best for test-other
The improvement of shallow fusion is very obvious! The relative WER reduction on test-other is around 10.5%.
The improvement of shallow fusion is very obvious! The relative WER reduction on test-other is around 10.5%.
A few parameters can be tuned to further boost the performance of shallow fusion:

- ``--lm-scale``
- ``--lm-scale``

Controls the scale of the LM. If too small, the external language model may not be fully utilized; if too large,
the LM score might be dominant during decoding, leading to bad WER. A typical value of this is around 0.3.

Controls the scale of the LM. If too small, the external language model may not be fully utilized; if too large,
the LM score may dominant during decoding, leading to bad WER. A typical value of this is around 0.3.
- ``--beam-size``

- ``--beam-size``

The number of active paths in the search beam. It controls the trade-off between decoding efficiency and accuracy.

Here, we also show how `--beam-size` effect the WER and decoding time:
Expand Down Expand Up @@ -176,4 +176,4 @@ As we see, a larger beam size during shallow fusion improves the WER, but is als




6 changes: 3 additions & 3 deletions decoding-with-langugage-models/LODR.html
Original file line number Diff line number Diff line change
Expand Up @@ -120,14 +120,14 @@
<p>First, let’s have a look at some background information. As the predecessor of LODR, Density Ratio (DR) is first proposed <a class="reference external" href="https://arxiv.org/abs/2002.11268">here</a>
to address the language information mismatch between the training
corpus (source domain) and the testing corpus (target domain). Assuming that the source domain and the test domain
are acoustically similar, DR derives the following formular for decoding with Bayes’ theorem:</p>
are acoustically similar, DR derives the following formula for decoding with Bayes’ theorem:</p>
<div class="math notranslate nohighlight">
\[\text{score}\left(y_u|\mathit{x},y\right) =
\log p\left(y_u|\mathit{x},y_{1:u-1}\right) +
\lambda_1 \log p_{\text{Target LM}}\left(y_u|\mathit{x},y_{1:u-1}\right) -
\lambda_2 \log p_{\text{Source LM}}\left(y_u|\mathit{x},y_{1:u-1}\right)\]</div>
<p>where <span class="math notranslate nohighlight">\(\lambda_1\)</span> and <span class="math notranslate nohighlight">\(\lambda_2\)</span> are the weights of LM scores for target domain and source domain respectively.
Here, the source domain LM is trained on the training corpus. The only difference in the above formular compared to
Here, the source domain LM is trained on the training corpus. The only difference in the above formula compared to
shallow fusion is the subtraction of the source domain LM.</p>
<p>Some works treat the predictor and the joiner of the neural transducer as its internal LM. However, the LM is
considered to be weak and can only capture low-level language information. Therefore, <a class="reference external" href="https://arxiv.org/abs/2203.16776">LODR</a> proposed to use
Expand All @@ -140,7 +140,7 @@
\lambda_2 \log p_{\text{bi-gram}}\left(y_u|\mathit{x},y_{1:u-1}\right)\]</div>
<p>In LODR, an additional bi-gram LM estimated on the source domain (e.g training corpus) is required. Compared to DR,
the only difference lies in the choice of source domain LM. According to the original <a class="reference external" href="https://arxiv.org/abs/2203.16776">paper</a>,
LODR achieves similar performance compared DR in both intra-domain and cross-domain settings.
LODR achieves similar performance compared to DR in both intra-domain and cross-domain settings.
As a bi-gram is much faster to evaluate, LODR is usually much faster.</p>
<p>Now, we will show you how to use LODR in <code class="docutils literal notranslate"><span class="pre">icefall</span></code>.
For illustration purpose, we will use a pre-trained ASR model from this <a class="reference external" href="https://huggingface.co/Zengwei/icefall-asr-librispeech-pruned-transducer-stateless7-streaming-2022-12-29">link</a>.
Expand Down
2 changes: 1 addition & 1 deletion decoding-with-langugage-models/shallow-fusion.html
Original file line number Diff line number Diff line change
Expand Up @@ -223,7 +223,7 @@
<li><p><code class="docutils literal notranslate"><span class="pre">--lm-scale</span></code></p>
<blockquote>
<div><p>Controls the scale of the LM. If too small, the external language model may not be fully utilized; if too large,
the LM score may dominant during decoding, leading to bad WER. A typical value of this is around 0.3.</p>
the LM score might be dominant during decoding, leading to bad WER. A typical value of this is around 0.3.</p>
</div></blockquote>
</li>
<li><p><code class="docutils literal notranslate"><span class="pre">--beam-size</span></code></p>
Expand Down
2 changes: 1 addition & 1 deletion searchindex.js

Large diffs are not rendered by default.

0 comments on commit e5fed50

Please sign in to comment.