-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unit tests for MiCS #4792
Unit tests for MiCS #4792
Conversation
@zarzen, thanks! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it will be more clear if we parameterize a mics_enabled
variable rather than use zero_stage
to switch it on/off. Please update the other test with similar changes. Thanks!
@mrwyattii thanks for suggestions, I updated the implementation accordingly. Besides, as the mics implementation currently is not compatible with offloading, I removed the unittest for |
In response to the ask from microsoft#2964 (comment), I added three more unit tests related to MiCS. There are two knowledge issues: - Testing on Torch 2.1.0 triggers `_IllegalWorker` in coalesced all gather. I made changes to ignore this condition. and Currently, I don't know the reason. - The MiCS implementation is not working with offloading, so the failure in `TestZeroPartialOffloadConfigSweep` is expected. --------- Co-authored-by: Logan Adams <[email protected]>
In response to the ask from #2964 (comment), I added three more unit tests related to MiCS.
There are two knowledge issues:
_IllegalWorker
in coalesced all gather. I made changes to ignore this condition. and Currently, I don't know the reason.TestZeroPartialOffloadConfigSweep
is expected.