-
Notifications
You must be signed in to change notification settings - Fork 310
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add doc about FST-based CTC forced alignment. (#1482)
- Loading branch information
1 parent
4d5c1f2
commit ec0389a
Showing
20 changed files
with
787 additions
and
8 deletions.
There are no files selected for viewing
Binary file added
BIN
+106 KB
docs/source/_static/kaldi-align/Lab41-SRI-VOiCES-src-sp0307-ch127535-sg0042.wav
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
Binary file not shown.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,41 @@ | ||
Two approaches | ||
============== | ||
|
||
Two approaches for FST-based forced alignment will be described: | ||
|
||
- `Kaldi`_-based | ||
- `k2`_-based | ||
|
||
Note that the `Kaldi`_-based approach does not depend on `Kaldi`_ at all. | ||
That is, you don't need to install `Kaldi`_ in order to use it. Instead, | ||
we use `kaldi-decoder`_, which has ported the C++ decoding code from `Kaldi`_ | ||
without depending on it. | ||
|
||
Differences between the two approaches | ||
-------------------------------------- | ||
|
||
The following table compares the differences between the two approaches. | ||
|
||
.. list-table:: | ||
|
||
* - Features | ||
- `Kaldi`_-based | ||
- `k2`_-based | ||
* - Support CUDA | ||
- No | ||
- Yes | ||
* - Support CPU | ||
- Yes | ||
- Yes | ||
* - Support batch processing | ||
- No | ||
- Yes on CUDA; No on CPU | ||
* - Support streaming models | ||
- Yes | ||
- No | ||
* - Support C++ APIs | ||
- Yes | ||
- Yes | ||
* - Support Python APIs | ||
- Yes | ||
- Yes |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,18 @@ | ||
FST-based forced alignment | ||
========================== | ||
|
||
This section describes how to perform **FST-based** ``forced alignment`` with models | ||
trained by `CTC`_ loss. | ||
|
||
We use `CTC FORCED ALIGNMENT API TUTORIAL <https://pytorch.org/audio/main/tutorials/ctc_forced_alignment_api_tutorial.html>`_ | ||
from `torchaudio`_ as a reference in this section. | ||
|
||
Different from `torchaudio`_, we use an ``FST``-based approach. | ||
|
||
.. toctree:: | ||
:maxdepth: 2 | ||
:caption: Contents: | ||
|
||
diff | ||
kaldi-based | ||
k2-based |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,4 @@ | ||
k2-based forced alignment | ||
========================= | ||
|
||
TODO(fangjun) |
Oops, something went wrong.