German Tacotron 2 and Multi-band MelGAN in TensorFlow with TF Lite inference support
Google supported this work by providing Google Cloud credit. Thank you Google for supporting the open source! 🎉
I am releasing pretrained German neural text-to-speech (TTS) models Tacotron 2 and Multi-band MelGAN. It supports inference with saved_model
and TF Lite
formats, and all the models can be found on TensorFlow Hub.
💬 Say hello in Discussions if you find it useful for anything.
- See
inference.py
to infer withsaved_model
. - See
inference_tflite.py
to infer withTF Lite
. - See
e2e-notebook.ipynb
to check how I exported to these model formats. - see releases to download pretrained models.
I trained these models on Thorsten dataset by Thorsten Müller. It is licensed under the terms of Creative Commons Zero V1 Universal (CC0), which is used to opt out of copyright entirely and ensure that the work has the widest reach. Thanks @thorstenMueller for such a great contribution to the community.
Some good guys are doing a great job at tensorspeech/TensorFlowTTS, which was already supporting TTS in English, Chinese and Korean. I wanted to contribute with support for German and trained these models. Now it supports both training and inference with proper processors. A detailed blog post will follow up, but some quick notes for now:
- I made use of german_transliterate For text preprocessing. Basically it normalizes numbers (e.g. converts digits to words), expands abbreviations and cares German umlauts and punctuations. For inference examples released in this repo, it is the only dependency apart from TensorFlow.
- You need to convert input text to numerical IDs to feed into the model. I am sharing a reference implementation for this in inference examples, and you need to code this logic to use the models in non-Python environments (e.g., Android).
Tacotron 2
produces some noise at the end, and you need to cut it off. Again, inference examples show how to do this.- I exported
Multi-band MelGAN
toTF Lite
without optimizations because it produced some background noise when I exported with the default ones. I used default optimizations inTacotron 2
. saved_model
formats that I am releasing here are not suitable for finetuning. Architecture implementation usesSubclassing API
in TensorFlow 2.x and gets multiple inputs incall
method for teacher forcing during training. This caused some problems when exporting tosaved_model
and I had to remove this logic before exporting. If you want to finetune models, please see my fork of TensorFlowTTS.
You can use these pretrained model artifacts and code examples under the terms of Apache 2.0 license. On the other hand, you may want to contact me for paid consultancies and/or collaborations in speech and/or NLP projects at the email address shown on my profile.