-
Notifications
You must be signed in to change notification settings - Fork 118
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multi-GPU training and expected epochs #9
Comments
@bieltura Hi! Thank you for your interest in Grad-TTS work.
|
Hi @ivanvovk, Thanks for the answering of the quesitons. Here's an update that my be helpful for future development: DataParallel can not be implemented in the current setup, as Apart from that, I have found that using multiple GPUs, code breaks when, for a batch, the length of an audio sample is less than the 2sec speech fragments. The solution is to force the shape to be always this 2 sec (in frames).
I still find that 2300 epochs in a single GPU is a very large amount of training. Did you follow any procedure to check when the modeled converged to the best checkpoint? Thanks |
@bieltura it is usually preferred to use What about checking the convergence of the model, we just checked the quality at 10 iterations, and when it became good, we stopped training. Nothing special. |
Thanks! As a side note, we have been using the Energy metric (predicted-target difference) to check whether samples are "good enough" for evaluation. As you mentioned in your paper, diffusion loss is not informative in terms of model convergence, as it has to update to all possible steps from 0 to T (and this is picked up randomly). Here are some plots that may be useful to you as well. Feel free to close the issue when you read it :) And again, thanks for everything. |
For my case, I found Accelerate very useful: https://github.com/huggingface/accelerate with just several lines of code. |
Hi,
First of all, thanks for the nice paper and release code. I am testing your model for a different dataset and two questions come up:
Thanks!
The text was updated successfully, but these errors were encountered: