-
Notifications
You must be signed in to change notification settings - Fork 304
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Provides
README.md
for TTS recipes (#1491)
* Update README.md
- Loading branch information
Showing
2 changed files
with
75 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,38 @@ | ||
# Introduction | ||
|
||
This is a public domain speech dataset consisting of 13,100 short audio clips of a single speaker reading passages from 7 non-fiction books. | ||
A transcription is provided for each clip. | ||
Clips vary in length from 1 to 10 seconds and have a total length of approximately 24 hours. | ||
|
||
The texts were published between 1884 and 1964, and are in the public domain. | ||
The audio was recorded in 2016-17 by the [LibriVox](https://librivox.org/) project and is also in the public domain. | ||
|
||
The above information is from the [LJSpeech website](https://keithito.com/LJ-Speech-Dataset/). | ||
|
||
# VITS | ||
|
||
This recipe provides a VITS model trained on the LJSpeech dataset. | ||
|
||
Pretrained model can be found [here](https://huggingface.co/Zengwei/icefall-tts-ljspeech-vits-2024-02-28). | ||
|
||
For tutorial and more details, please refer to the [VITS documentation](https://k2-fsa.github.io/icefall/recipes/TTS/ljspeech/vits.html). | ||
|
||
The training command is given below: | ||
``` | ||
export CUDA_VISIBLE_DEVICES=0,1,2,3 | ||
./vits/train.py \ | ||
--world-size 4 \ | ||
--num-epochs 1000 \ | ||
--start-epoch 1 \ | ||
--use-fp16 1 \ | ||
--exp-dir vits/exp \ | ||
--max-duration 500 | ||
``` | ||
|
||
To inference, use: | ||
``` | ||
./vits/infer.py \ | ||
--exp-dir vits/exp \ | ||
--epoch 1000 \ | ||
--tokens data/tokens.txt | ||
``` |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,37 @@ | ||
# Introduction | ||
|
||
This CSTR VCTK Corpus includes speech data uttered by 110 English speakers with various accents. Each speaker reads out about 400 sentences, which were selected from a newspaper, the rainbow passage and an elicitation paragraph used for the speech accent archive. | ||
The newspaper texts were taken from Herald Glasgow, with permission from Herald & Times Group. Each speaker has a different set of the newspaper texts selected based a greedy algorithm that increases the contextual and phonetic coverage. | ||
The details of the text selection algorithms are described in the following paper: [C. Veaux, J. Yamagishi and S. King, "The voice bank corpus: Design, collection and data analysis of a large regional accent speech database,"](https://doi.org/10.1109/ICSDA.2013.6709856). | ||
|
||
The above information is from the [CSTR VCTK website](https://datashare.ed.ac.uk/handle/10283/3443). | ||
|
||
# VITS | ||
|
||
This recipe provides a VITS model trained on the VCTK dataset. | ||
|
||
Pretrained model can be found [here](https://huggingface.co/zrjin/icefall-tts-vctk-vits-2023-12-05), note that this model was pretrained on the Edinburgh DataShare VCTK dataset. | ||
|
||
For tutorial and more details, please refer to the [VITS documentation](https://k2-fsa.github.io/icefall/recipes/TTS/vctk/vits.html). | ||
|
||
The training command is given below: | ||
``` | ||
export CUDA_VISIBLE_DEVICES="0,1,2,3" | ||
./vits/train.py \ | ||
--world-size 4 \ | ||
--num-epochs 1000 \ | ||
--start-epoch 1 \ | ||
--use-fp16 1 \ | ||
--exp-dir vits/exp \ | ||
--tokens data/tokens.txt | ||
--max-duration 350 | ||
``` | ||
|
||
To inference, use: | ||
``` | ||
./vits/infer.py \ | ||
--epoch 1000 \ | ||
--exp-dir vits/exp \ | ||
--tokens data/tokens.txt \ | ||
--max-duration 500 | ||
``` |