remove weight parallelism #137

eitanturok · 2024-08-09T18:07:40Z

What does this PR do?

This PR removes weight parallelism as we never use it. Tagging @tgale96.

Since we use FSDP's weight parallelism and not our own custom weight parallelism in MegaBlocks, I wanted to remove the weight parallelism implementation.

Specifically, we

Remove test_parallelism.py because this file tests that weight parallelism and expert parallelism have the same results.
Remove moe_weight_parallelism and weight_parallel_group from the args.
Remove weight parallelism from all the layers.

Because moe_weight_parallelism is False by default and weight_parallel_group is None by default,
mpu.get_weight_parallel_world_size(args) always returned 1 and mpu.get_weight_parallel_rank(args) always returns 0. This allowed us to drastically simplify things in mlp.create_dmoe_expert_weights().

Also, can I get an extra close review of my changes to the MemoryOptimizedMLP.parallel_forward() method? I noticed that the group would always be None there but I am hesitant to hard-code this in. Not sure if this is the right thing to do.

Also, I ran all tests locally and they pass.

(Also, enjoy this nice PR template that I added!)

What issue(s) does this change relate to?

Before submitting

Have you read the contributor guidelines?
Is this change a documentation change or typo fix? If so, skip the rest of this checklist.
Was this change discussed/approved in a GitHub issue first? It is much more likely to be merged if so.
Did you update any related docs and document your change?
Did you update any related tests and add any new tests related to your change? (see testing)
Did you run the tests locally to make sure they pass?
Did you run pre-commit on your change? (see the pre-commit section of prerequisites)

tgale96 · 2024-08-09T19:47:29Z

So this looks great but I think you can remove a lot more code!

weight_parallel.py can be completely removed, I think. And then all uses of it, including the parallel_forward function you called out, which I don't think is used anywhere anyways?

Deleting that file will help you track down every use of weight parallelism as well, since it all gets routed into that one.

eitanturok · 2024-08-09T21:58:23Z

removed parallel_forward
deleted weight_parallel.py
all tests still pass
Discussing with repo maintainers, we decided to delete weight parallelism rather than deprecate it

megablocks/layers/mlp.py

tgale96 · 2024-08-12T14:03:45Z

LGTM! The last thing I might do is to just grep weight_parallel, if you haven't already. But I think you got everything.

eitanturok · 2024-08-12T16:01:53Z

Already grep-ed! Will merge this later today.

mvpatel2000

LGTM

Eitan Turok added 2 commits August 9, 2024 18:04

remove weight parallelism

3016e09

fix linting

b7c0d7c

eitanturok requested review from mihir-db, josejg and vchiley August 9, 2024 18:16

eitanturok and others added 3 commits August 9, 2024 16:52

Merge branch 'main' into weight-parallelism

8f435d4

remove parallel forward from mlp

3b98f25

remove weight parallel

c14252d

tgale96 reviewed Aug 12, 2024

View reviewed changes

megablocks/layers/mlp.py Outdated Show resolved Hide resolved

cleanup

92b1607

mvpatel2000 approved these changes Aug 12, 2024

View reviewed changes

mihir-db approved these changes Aug 12, 2024

View reviewed changes

mihir-db merged commit 27d3d2c into databricks:main Aug 12, 2024
3 checks passed

eitanturok deleted the weight-parallelism branch August 20, 2024 05:24

snarayan21 mentioned this pull request Aug 31, 2024

[WIP] Only torch 2.4.0 compatible mosaicml/llm-foundry#1505

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

remove weight parallelism #137

remove weight parallelism #137

eitanturok commented Aug 9, 2024 •

edited

Loading

tgale96 commented Aug 9, 2024

eitanturok commented Aug 9, 2024 •

edited

Loading

tgale96 commented Aug 12, 2024

eitanturok commented Aug 12, 2024 •

edited

Loading

mvpatel2000 left a comment

remove weight parallelism #137

remove weight parallelism #137

Conversation

eitanturok commented Aug 9, 2024 • edited Loading

What does this PR do?

What issue(s) does this change relate to?

Before submitting

tgale96 commented Aug 9, 2024

eitanturok commented Aug 9, 2024 • edited Loading

tgale96 commented Aug 12, 2024

eitanturok commented Aug 12, 2024 • edited Loading

mvpatel2000 left a comment

Choose a reason for hiding this comment

eitanturok commented Aug 9, 2024 •

edited

Loading

eitanturok commented Aug 9, 2024 •

edited

Loading

eitanturok commented Aug 12, 2024 •

edited

Loading