-
Notifications
You must be signed in to change notification settings - Fork 118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] Unable to run distributed training using TTS recipe for yourtts #113
Comments
Hello, did you find a way to deal with it? |
Describe the bug
I've been trying to train yourtts on a google compute instance, but it doesn't seem to work using trainer.distribute.
Previously i could run it, but it would get up to the same point in initialization and then crash one of the training workers, with the others freezing.
I am running largely unchanged code from the provided recipe, and have simply reduced the worker count to work on the cloud instance, and added my own dataset.
It previously trained fine without distributed training until it runs out of vram. and training locally on a 3090 works fine if not slowly.
Also TTS is installed to the latest version, not sure why collect_env_info.py didn't catch it.
To Reproduce
CUDA_VISIBLE_DEVICES="0,1,2,3" python -m trainer.distribute --script train_yourtts.py
on google compute instanceExpected behavior
Runs the training script with processing split between the GPUs.
Logs
Environment
Additional context
No response
The text was updated successfully, but these errors were encountered: