Welcome to GSVI, an inference-specialized plugin built on top of GPT-SoVITS to enhance your text-to-speech (TTS) experience with a user-friendly API interface. This plugin enriches the original GPT-SoVITS project, making voice synthesis more accessible and versatile.
Please note that we do not recommend using GSVI for training. Its existence is to make the process of using GPT-soVITS simpler and more comfortable for others, and to make model sharing easier.
This fork is mainly based on the fast_inference_
branch, using a lot of PR code contributed by ChasonJiang. Thanks to this great developer. ”Dalao NB!“
At the same time, the Inference folder used by this branch is the main submodule, coming from https://github.com/X-T-E-R/TTS-for-GPT-soVITS.
- High-level abstract interface for easy character and emotion selection
- Comprehensive TTS engine support (speaker selection, speed adjustment, volume control)
- User-friendly design for everyone
- Simply place the shared character model folder, and you can quickly use it.
- High compatibility and extensibility for various platforms and applications (for example: SillyTavern)
- Install manually or use prezip for Windows
- Put your character model folders
- Run bat file or run python file manually
- If you encounter issues, join our community or consult the FAQ. QQ Group: 863760614 , Discord (AI Hub):
We look forward to seeing how you use GSVI to bring your creative projects to life!
Prezip : https://huggingface.co/XTer123/GSVI_prezip/tree/main
You could see a bunch of bat files in 0 Bat Files/
- If you want to update, then run bat 0 and 1 (or 999 0 1)
- If you want to start with a single gradio file, then run bat 3
- If you want to start with backend and frontend , run bat 5 and 6
- If you want to manage your models, run 10.bat
- Gradio Application:
app.py
- Gradio Model Management Interface:
webui/webui.py
For API documentation, visit our Yuque documentation page. or API Doc.md
In a character model folder, like trained/Character1/
Put the pth / ckpt / wav files in it, the wav should be named as the prompt text
Like :
trained
--hutao
----hutao-e75.ckpt
----hutao_e60_s3360.pth
----hutao said something.wav
To make that, open the Model Manage Tool (10.bat /webuis/character_manager/webui.py)
It can assign a reference audio to each emotion, aiming to achieve the implementation of emotion options.
You could install this with the guide bellow, then download pretrained models from GPT-SoVITS Models and place them in GPT_SoVITS/pretrained_models
, and put your character model folder in trained
Or just download the pre-packaged distribution for Windows. ( then put your character model folder in trained
)
About the character model folder, see below
- Python 3.9, PyTorch 2.0.1, CUDA 11
- Python 3.10.13, PyTorch 2.1.2, CUDA 12.3
- Python 3.9, PyTorch 2.3.0.dev20240122, macOS 14.3 (Apple silicon)
Note: numba==0.56.4 requires py<3.11
If you are a Windows user (tested with win>=10), you can directly download the pre-packaged distribution and double-click on go-webui.bat to start GPT-SoVITS-WebUI.
Or pip install -r requirements.txt
, and then double click the install.bat
conda create -n GPTSoVits python=3.10
conda activate GPTSoVits
bash install.sh
Note: The models trained with GPUs on Macs result in significantly lower quality compared to those trained on other devices, so we are temporarily using CPUs instead.
First make sure you have installed FFmpeg by running brew install ffmpeg
or conda install ffmpeg
, then install by using the following commands:
conda create -n GPTSoVits python=3.10
conda activate GPTSoVits
pip install -r requirements.txt
git submodule init
git submodule update --init --recursive
conda install ffmpeg
sudo apt install ffmpeg
sudo apt install libsox-dev
conda install -c conda-forge 'ffmpeg<7'
Download and place ffmpeg.exe and ffprobe.exe in the GPT-SoVITS root.
Download pretrained models from GPT-SoVITS Models and place them in GPT_SoVITS/pretrained_models
.
Please prepare local path and models before running the following command.
- output:The output dirctory of wav files
- logs: for recording logs
- SoVITS_weights: SoVITS weights
- GPT_SoVITS: all pretrained_models are in GPT_SoVITS/pretrained_models which is a big size
- nltk_data: nltk library, please download it with the following command:
python -m nltk.downloader -d ./nltk_data averaged_perceptron_tagger cmudict
- trained: trained models(From which you trained or borrowed from others)
docker build -t gpt-sovits-inference:latest -f Dockerfile .
docker run --rm -it -d --gpus="device=0" --env=is_half=False \
--volume=<Replace with the path of your project>/GPT-SoVITS-Inference/output:/workspace/output \
--volume=<Replace with the path of your project>/GPT-SoVITS-Inference/logs:/workspace/logs \
--volume=<Replace with the path of your project>/GPT-SoVITS-Inference/SoVITS_weights:/workspace/SoVITS_weights \
--volume=<Replace with the path of your project>/GPT-SoVITS-Inference/GPT_SoVITS/:/workspace/GPT_SoVITS \
--volume=<Replace with the path of your project>/GPT-SoVITS-Inference/nltk_data:/usr/local/nltk_data \
--volume=<Replace with the path of your project>/GPT-SoVITS-Inference/trained:/workspace/trained \
--workdir=/workspace -p 5000:5000 --shm-size="16G" gpt-sovits-inference:latest
Remove the pyaudio
in the requirements.txt
!!!!
This fork is mainly based on the fast_inference_
branch of GPT-soVITS project, using a lot of PR code contributed by ChasonJiang.
Special thanks to the following projects and contributors: