This repository contains a copy of the US English HTS demo. The HTS demos are designed to demonstrate the capabilities of the HMM-based Speech Synthesis System (HTS) for statistical parametric speech synthesis.
Please note that this repository is an unofficial copy of the US English HTS demo and is not endorsed in any way by the HTS working group who maintains the HTS demos. Accordingly any questions or bug reports should be directed to the HTS working group using the contact details given below.
Starting from a corpus of text data and speech audio data from a human speaker, this software trains a statistical parametric speech synthesis system designed to imitate that speaker. This trained system can then be used to synthesize speech audio for new pieces of text data.
The text data may be provided in one of two forms:
- as simple text files
- as utterance files generated by Festival using the same phoneme set used by the CMU Pronouncing Dictionary
The speech data should be provided in the form of raw 48 kHz 16-bit audio files, which are essentially just wav files with the header information removed.
This repository contains some minor modifications from the official version of the US English HTS demo:
- it does not include a corpus of text data and speech data for training the synthesis system. The corpus of data used by the official version is based on the CMU ARCTIC corpus for speaker SLT and may obtained by downloading the official version of the US English HTS demo from the HTS demo website.
- it adds this README file
- it adds the
License
file - it removes the
configure
script since a suitable script can be automatically generated from the providedconfigure.ac
file usingautoconf
. This follows standard version control practices.
Please see the file License
for details of the license and warranty for
hts-demo-en-US-cmudict.
The source code is hosted in the hts-demo-en-US-cmudict github repository. To obtain the latest source code using git:
git clone git://github.com/MattShannon/hts-demo-en-US-cmudict.git
hts-demo-en-US-cmudict depends on the following software packages:
- Festival
(for the
dumpfeats
script) - Speech Processing Toolkit (SPTK)
- HMM-based Speech Synthesis System (HTS)
- hts_engine API (if synthesis using HTS engine rather than HTS's HMGenS tool is desired)
If STRAIGHT vocoding is used (recommended for better quality, though it is not available under a permissive license) then the following software packages are also required:
- STRAIGHT vocoder
- MATLAB (to run the STRAIGHT vocoder)
The versions of these software packages required can be found in INSTALL
.
There are two high-level choices that need to be made when setting-up this directory:
- whether to provide the text data for the corpus you wish to train on as simple
text files (in which case the
configure
variable USEUTT should be set to 0) or as Festival utterance files (in which case USEUTT should be set to 1). Using Festival utterance files is the default. - whether to use the STRAIGHT vocoder (in which case the
configure
variable USESTRAIGHT should be set to 1) or the non-STRAIGHT vocoder (in which case USESTRAIGHT should be set to 0). Using the non-STRAIGHT vocoder is the default, but using the STRAIGHT vocoder is recommended if possible for better quality.
To set-up this directory:
- add appropriate text files in
data/txt
(if USEUTT is 0), Festival utterance files indata/utts
(if USEUTT is 1) and raw audio files indata/raw
for the corpus you wish to use during training (see above for details of the formats used and details of where to obtain the processed CMU ARCTIC corpus typically used with this HTS demo) - generate the
configure
script fromconfigure.ac
usingautoconf
- follow the instructions for the official version included in
INSTALL
, setting the USEUTT and USESTRAIGHTconfigure
variables appropriately. If you are using a corpus other than the CMU ARCTIC corpus for speaker SLT, you may wish to also specify the LOWERF0, UPPERF0, DATASET and SPEAKERconfigure
variables.
Please use the HTS users mailing list to submit bugs related to the US English HTS demo, preferably after verifying that the bug still occurs with the most recent official version available from the HTS demo website. Bugs specifically about this copy of the HTS demo can be submitted to the issue tracker.
The author of the US English HTS demo is the HTS working group. More information is available on the HTS website and from the HTS users mailing list. The host of the hts-demo-en-US-cmudict github repository is Matt Shannon.