-
Notifications
You must be signed in to change notification settings - Fork 175
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
remove weight parallelism #137
Conversation
So this looks great but I think you can remove a lot more code! weight_parallel.py can be completely removed, I think. And then all uses of it, including the Deleting that file will help you track down every use of weight parallelism as well, since it all gets routed into that one. |
|
LGTM! The last thing I might do is to just |
Already grep-ed! Will merge this later today. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
What does this PR do?
This PR removes weight parallelism as we never use it. Tagging @tgale96.
Since we use FSDP's weight parallelism and not our own custom weight parallelism in MegaBlocks, I wanted to remove the weight parallelism implementation.
Specifically, we
test_parallelism.py
because this file tests that weight parallelism and expert parallelism have the same results.moe_weight_parallelism
andweight_parallel_group
from the args.Because
moe_weight_parallelism
isFalse
by default andweight_parallel_group
is None by default,mpu.get_weight_parallel_world_size(args)
always returned 1 andmpu.get_weight_parallel_rank(args)
always returns 0. This allowed us to drastically simplify things inmlp.create_dmoe_expert_weights()
.Also, can I get an extra close review of my changes to the
MemoryOptimizedMLP.parallel_forward()
method? I noticed that the group would always beNone
there but I am hesitant to hard-code this in. Not sure if this is the right thing to do.Also, I ran all tests locally and they pass.
(Also, enjoy this nice PR template that I added!)
What issue(s) does this change relate to?
Before submitting
pre-commit
on your change? (see thepre-commit
section of prerequisites)