Add doc about FST-based CTC forced alignment. (#1482)

k2-fsa · Jun 12, 2024 · ec0389a · ec0389a
1 parent 4d5c1f2
commit ec0389a
Show file tree

Hide file tree

Showing 20 changed files with 787 additions and 8 deletions.
diff --git a/docs/source/_static/kaldi-align/Lab41-SRI-VOiCES-src-sp0307-ch127535-sg0042.wav b/docs/source/_static/kaldi-align/Lab41-SRI-VOiCES-src-sp0307-ch127535-sg0042.wav
diff --git a/docs/source/_static/kaldi-align/at.wav b/docs/source/_static/kaldi-align/at.wav
diff --git a/docs/source/_static/kaldi-align/beside.wav b/docs/source/_static/kaldi-align/beside.wav
diff --git a/docs/source/_static/kaldi-align/curiosity.wav b/docs/source/_static/kaldi-align/curiosity.wav
diff --git a/docs/source/_static/kaldi-align/had.wav b/docs/source/_static/kaldi-align/had.wav
diff --git a/docs/source/_static/kaldi-align/i.wav b/docs/source/_static/kaldi-align/i.wav
diff --git a/docs/source/_static/kaldi-align/me.wav b/docs/source/_static/kaldi-align/me.wav
diff --git a/docs/source/_static/kaldi-align/moment.wav b/docs/source/_static/kaldi-align/moment.wav
diff --git a/docs/source/_static/kaldi-align/that.wav b/docs/source/_static/kaldi-align/that.wav
diff --git a/docs/source/_static/kaldi-align/this.wav b/docs/source/_static/kaldi-align/this.wav
diff --git a/docs/source/conf.py b/docs/source/conf.py
@@ -98,4 +98,6 @@
 .. _Next-gen Kaldi: https://github.com/k2-fsa
 .. _Kaldi: https://github.com/kaldi-asr/kaldi
 .. _lilcom: https://github.com/danpovey/lilcom
+.. _CTC: https://www.cs.toronto.edu/~graves/icml_2006.pdf
+.. _kaldi-decoder: https://github.com/k2-fsa/kaldi-decoder
 """
diff --git a/docs/source/docker/intro.rst b/docs/source/docker/intro.rst
@@ -34,6 +34,8 @@ which will give you something like below:
 
 .. code-block:: bash
 
+  "torch2.3.1-cuda12.1"
+  "torch2.3.1-cuda11.8"
   "torch2.2.2-cuda12.1"
   "torch2.2.2-cuda11.8"
   "torch2.2.1-cuda12.1"

diff --git a/docs/source/fst-based-forced-alignment/diff.rst b/docs/source/fst-based-forced-alignment/diff.rst
@@ -0,0 +1,41 @@
+Two approaches
+==============
+
+Two approaches for FST-based forced alignment will be described:
+
+  - `Kaldi`_-based
+  - `k2`_-based
+
+Note that the `Kaldi`_-based approach does not depend on `Kaldi`_ at all.
+That is, you don't need to install `Kaldi`_ in order to use it. Instead,
+we use `kaldi-decoder`_, which has ported the C++ decoding code from `Kaldi`_
+without depending on it.
+
+Differences between the two approaches
+--------------------------------------
+
+The following table compares the differences between the two approaches.
+
+.. list-table::
+
+ * - Features
+   - `Kaldi`_-based
+   - `k2`_-based
+ * - Support CUDA
+   - No
+   - Yes
+ * - Support CPU
+   - Yes
+   - Yes
+ * - Support batch processing
+   - No
+   - Yes on CUDA; No on CPU
+ * - Support streaming models
+   - Yes
+   - No
+ * - Support C++ APIs
+   - Yes
+   - Yes
+ * - Support Python APIs
+   - Yes
+   - Yes
diff --git a/docs/source/fst-based-forced-alignment/index.rst b/docs/source/fst-based-forced-alignment/index.rst
@@ -0,0 +1,18 @@
+FST-based forced alignment
+==========================
+
+This section describes how to perform **FST-based** ``forced alignment`` with models
+trained by `CTC`_ loss.
+
+We use `CTC FORCED ALIGNMENT API TUTORIAL <https://pytorch.org/audio/main/tutorials/ctc_forced_alignment_api_tutorial.html>`_
+from `torchaudio`_ as a reference in this section.
+
+Different from `torchaudio`_, we use an ``FST``-based approach.
+
+.. toctree::
+   :maxdepth: 2
+   :caption: Contents:
+
+   diff
+   kaldi-based
+   k2-based
diff --git a/docs/source/fst-based-forced-alignment/k2-based.rst b/docs/source/fst-based-forced-alignment/k2-based.rst
@@ -0,0 +1,4 @@
+k2-based forced alignment
+=========================
+
+TODO(fangjun)