srllctts

A simple utility for synthesizing English speech from the command line; uses NVIDIA's Tacotron2 and WaveGlow models to do the work, both of which were trained using the LJ Speech dataset. Just a quick little thing that we thought was neat.

Some of the code is taken directly from NVIDIA's TorchHub example (see links).

Please note that this is not a maintained project. Additionally, it also no longer represents state-of-the-art; while I haven't taken the time to investigate the following, it may be of interest:

MelGAN Vocoder code, and paper.

Links

NVIDIA's WaveGlow example
Natural TTS Synthesis by Conditioning WaveNet on Mel Spectrogram Predictions
WaveGlow: A Flow-based Generative Network for Speech Synthesis

Samples of the output

Decent: knuth.wav
Really bad: shakespeare.wav

Dependencies

You can pip install -r DEPENDENCIES to get these.

torch
matplotlib
numpy
inflect
librosa
scipy
unidecode
plac

Execution time

With a GTX 1080 Ti video card and an Intel Core i7-7700k (4.2GHz), it takes roughly a second per word or two.

Licenses

The LJ Speech dataset is public domain and NVIDIA's models are covered by a BSD 3-clause license. Imagine the court battles Hollywood is going to go through when we really get these things right.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

srllctts

Links

Samples of the output

Dependencies

Execution time

Licenses

Files

README.md

Latest commit

History

README.md

File metadata and controls

srllctts

Links

Samples of the output

Dependencies

Execution time

Licenses