VALL-E

Inference code for Wenetspeech4TTS/Audiodec-Valle-Wenetspeech4TTS

Installation

git clone https://github.com/dukGuo/valle-audiodec.git
cd valle-audiodec
pip install -r requirements.txt

Download pre-train model

AudioDec

We use AudioDec as our speech tokenizer instead of encodec to further improve audio quality.

Please download the whole exp folder, unzip and put it in the AudioDec/exp directory.

cd valle-audiodec
wget https://github.com/facebookresearch/AudioDec/releases/download/pretrain_models_v02/exp.zip
unzip exp.zip
mv exp AudioDec/exp

VALL-E

Checkpiont available on Wenetspeech4TTS/Audiodec-Valle-Wenetspeech4TTS

VALL-E Basic :VALL-E trained with the WenetSpeech4TTS Basic subset
VALL-E Standard: VALL-E Basic fine-tuning with the WenetSpeech4TTS Standard subset
VALL-E Premium: VALL-E Standard fine-tuning with the WenetSpeech4TTS Premium subset

Speech Sample

https://wenetspeech4tts.github.io/wenetspeech4tts

https://rxy-j.github.io/HPMD-TTS

Inference

  cd valle-audiodec
  python infer_tts.py \ 
    --config config/hparams.yaml \
    --ar_ckpt ckpt/basic/ar.pt \
    --nar_ckpt ckpt/basic/nar.pt \
    --prompt_wav test/prompt_wavs/test_1.wav \
    --prompt_text 在夏日阴凉的树荫下，鸭妈妈孵着鸭宝宝。 \
    --text 负责指挥的将军在一旁交代着注意事项，每个人在上面最多只能待九十秒。

To improve audio quality and ensure consistent volume levels across different inputs, it is advisable to normalize the loudness of the prompt waveform before conducting inference. This preprocessing step helps achieve uniformity in the audio input, which can lead to more reliable inference outcomes.
sox $in_wave -r $sample_rate -b 16 --norm=-6 $out_wave

References

This repository is developed based on the following repositories.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VALL-E

Installation

Download pre-train model

AudioDec

VALL-E

Speech Sample

Inference

References

About

Contributors 2

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
AudioDec		AudioDec
config		config
models		models
test/prompt_wavs		test/prompt_wavs
text		text
utils		utils
LICENSE		LICENSE
README.md		README.md
infer_tts.py		infer_tts.py
requirements.txt		requirements.txt

License

dukGuo/valle-audiodec

Folders and files

Latest commit

History

Repository files navigation

VALL-E

Installation

Download pre-train model

AudioDec

VALL-E

Speech Sample

Inference

References

About

Resources

License

Stars

Watchers

Forks

Contributors 2

Languages