-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix assert on Lamb optimizers with BF16 #4451
base: master
Are you sure you want to change the base?
Conversation
deepspeed/runtime/engine.py
Outdated
@@ -1014,8 +1014,8 @@ def _do_sanity_check(self): | |||
self.optimizer_name()), "{} is not a supported DeepSpeed Optimizer".format(self.optimizer_name()) | |||
|
|||
if (self.optimizer_name() == LAMB_OPTIMIZER or self.optimizer_name() == ONEBIT_LAMB_OPTIMIZER): | |||
assert (self.dynamic_loss_scale()), "DeepSpeed {} optimizer requires dynamic loss scaling".format( | |||
self.optimizer_name()) | |||
assert (self.dynamic_loss_scale() and not self.bfloat16_enabled() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will try switching to check the optimizer wrapper as BFLOAT16 rather than just checking if bfloat is enabled
@loadams, let's add a unit test as well. |
@loadams Just fail on this assert when using lamb with bf16. May I ask if this will keep going? |
Hi @Liangliang-Ma - apologies, I lost track of this PR. I'll work on getting this PR updated and merged. |
@Liangliang-Ma - does this branch resolve your issue? Or do you have any other feedback on the PR? |
Yes, this one works. |
Failing HPU tests are a transformers issue that should be fixed in transformers soon. |
No description provided.