Skip to content

Commit

Permalink
Add doc about FST-based CTC forced alignment. (#1482)
Browse files Browse the repository at this point in the history
  • Loading branch information
csukuangfj authored Jun 12, 2024
1 parent 4d5c1f2 commit ec0389a
Show file tree
Hide file tree
Showing 20 changed files with 787 additions and 8 deletions.
Binary file not shown.
Binary file added docs/source/_static/kaldi-align/at.wav
Binary file not shown.
Binary file added docs/source/_static/kaldi-align/beside.wav
Binary file not shown.
Binary file added docs/source/_static/kaldi-align/curiosity.wav
Binary file not shown.
Binary file added docs/source/_static/kaldi-align/had.wav
Binary file not shown.
Binary file added docs/source/_static/kaldi-align/i.wav
Binary file not shown.
Binary file added docs/source/_static/kaldi-align/me.wav
Binary file not shown.
Binary file added docs/source/_static/kaldi-align/moment.wav
Binary file not shown.
Binary file added docs/source/_static/kaldi-align/that.wav
Binary file not shown.
Binary file added docs/source/_static/kaldi-align/this.wav
Binary file not shown.
2 changes: 2 additions & 0 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -98,4 +98,6 @@
.. _Next-gen Kaldi: https://github.com/k2-fsa
.. _Kaldi: https://github.com/kaldi-asr/kaldi
.. _lilcom: https://github.com/danpovey/lilcom
.. _CTC: https://www.cs.toronto.edu/~graves/icml_2006.pdf
.. _kaldi-decoder: https://github.com/k2-fsa/kaldi-decoder
"""
2 changes: 2 additions & 0 deletions docs/source/docker/intro.rst
Original file line number Diff line number Diff line change
Expand Up @@ -34,6 +34,8 @@ which will give you something like below:

.. code-block:: bash
"torch2.3.1-cuda12.1"
"torch2.3.1-cuda11.8"
"torch2.2.2-cuda12.1"
"torch2.2.2-cuda11.8"
"torch2.2.1-cuda12.1"
Expand Down
41 changes: 41 additions & 0 deletions docs/source/fst-based-forced-alignment/diff.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
Two approaches
==============

Two approaches for FST-based forced alignment will be described:

- `Kaldi`_-based
- `k2`_-based

Note that the `Kaldi`_-based approach does not depend on `Kaldi`_ at all.
That is, you don't need to install `Kaldi`_ in order to use it. Instead,
we use `kaldi-decoder`_, which has ported the C++ decoding code from `Kaldi`_
without depending on it.

Differences between the two approaches
--------------------------------------

The following table compares the differences between the two approaches.

.. list-table::

* - Features
- `Kaldi`_-based
- `k2`_-based
* - Support CUDA
- No
- Yes
* - Support CPU
- Yes
- Yes
* - Support batch processing
- No
- Yes on CUDA; No on CPU
* - Support streaming models
- Yes
- No
* - Support C++ APIs
- Yes
- Yes
* - Support Python APIs
- Yes
- Yes
18 changes: 18 additions & 0 deletions docs/source/fst-based-forced-alignment/index.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,18 @@
FST-based forced alignment
==========================

This section describes how to perform **FST-based** ``forced alignment`` with models
trained by `CTC`_ loss.

We use `CTC FORCED ALIGNMENT API TUTORIAL <https://pytorch.org/audio/main/tutorials/ctc_forced_alignment_api_tutorial.html>`_
from `torchaudio`_ as a reference in this section.

Different from `torchaudio`_, we use an ``FST``-based approach.

.. toctree::
:maxdepth: 2
:caption: Contents:

diff
kaldi-based
k2-based
4 changes: 4 additions & 0 deletions docs/source/fst-based-forced-alignment/k2-based.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
k2-based forced alignment
=========================

TODO(fangjun)
Loading

0 comments on commit ec0389a

Please sign in to comment.