We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Describe the bug Just like this PR: #5259 , ZeRO optimizer also needs to be fixed:
To Reproduce Steps to reproduce the behavior:
use ep=4 and adamw optimizer to train llm
Expected behavior expert gradients should be equal under ep=4 and ep=1, but currently it's 4 times bigger than ep=1
The text was updated successfully, but these errors were encountered:
@Jack47 Can you make a PR for this? Thanks!
Sorry, something went wrong.
#5681 has solved it @Jack47
jomayeri
No branches or pull requests
Describe the bug
Just like this PR: #5259 , ZeRO optimizer also needs to be fixed:
To Reproduce
Steps to reproduce the behavior:
use ep=4 and adamw optimizer to train llm
Expected behavior
expert gradients should be equal under ep=4 and ep=1, but currently it's 4 times bigger than ep=1
The text was updated successfully, but these errors were encountered: