-
Notifications
You must be signed in to change notification settings - Fork 2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Distributed Training for Fine-tuning Stable Diffusion Example #1802
Comments
Adding author of the notebook @sayakpaul |
Won't have the bandwidth to work on this. |
I have already expanded the code in the Trainer class for multi GPU training as well as the text introducing the reader to distributed training so it shouldn't take much time as it just needs review. @sayakpaul |
@clintg6 , Please create a PR, Keras team would be happy to review. |
Thank you will do |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Issue Type
Documentation Feature Request
Source
source
Keras Version
Keras 2.13.1
Custom Code
Yes
OS Platform and Distribution
Linux Ubuntu 22.04
Python version
3.9
GPU model and memory
No response
Current Behavior?
The current example documentation for Fine-tuning Stable Diffusion only demonstrates how to fine-tune on a single GPU. At the end of the documentation, Sayak concludes that to improve the quality of the stable diffusion model generation that the next steps would be "To enable that, having support for gradient accumulation and distributed training is crucial. This can be thought of as the next step in this tutorial.".
It is not trivial from reading the current TensorFlow docs how to update a custom Trainer class to achieve distributed training as current documentation mostly details it for compiled models. It would be nice to have a section going into greater detail for integrating a Trainer class with distributed training. Consequently, this example could also be included as an additional example of how to perform distributed training in Keras with custom Trainer classes.
I would like to update the documentation and associated files for this example to include a new section that demonstrates how to fine-tune a stable diffusion model in Keras/Tensorflow through distributed training using multiple GPUs when using custom Trainer classes. This would involve:
Standalone code to reproduce the issue or tutorial link
Relevant log output
No response
The text was updated successfully, but these errors were encountered: