Add zloss #133

Muennighoff · 2024-08-03T01:23:31Z

No description provided.

megablocks/layers/moe.py

dirkgr · 2024-09-06T18:56:43Z

Anything we can do to get this merged? We want to polish the OLMoE release, and it would be nice if we could just take a dependency on your version instead of @Muennighoff's.

mvpatel2000 · 2024-09-06T19:36:31Z

@Muennighoff can you resolve merge conflicts and make sure tests are passing?

CC: @josejg for review

josejg · 2024-09-08T08:08:47Z

I believe the current implementation will consume extra memory even when z-loss weight is zero. Even though the tensors are small, I always worry about keeping stuff around messing with the mem allocator. When we tested this, we had the zloss tensors separate from the LBL loss using a separate global variable and helper methods, which wouldn't save the logits if the zloss is disabled.
I'm also not fan of changing the Router and Experts API (returns and arguments) since that is not backwards compatible. In our implementation, the z-loss was handled entirely in router.py, which prevented this.
Similar for changing the API of batched_load_balancing_loss which now returns a tuple instead of a value, not backwards compatible. With our implementation the separate function prevents this.
Lastly, we did not observe any meaningful improvements when enabling router zloss, so I would keep it at a default of 0 instead of 1e-3 to prevent the computation in the default case and for backwards compatibility. I see Figure 11 in the OLMoE paper reports stability improvements but I guess the training settings are different.

Here's our implementation for reference - main...josejg/zloss . The main difference is that the modeling code is responsible for doing clear_router_zloss and batched_router_zloss in the corresponding places.

@mvpatel2000 Wdyt?

@Muennighoff could you refactor the PR to address these comments? We can also merge our implementation giving you credit in the PR if that's easier.

Muennighoff · 2024-09-08T16:26:06Z

oh just merge yours 👍

josejg · 2024-09-09T12:24:43Z

Sg, tracking it here - #151

mihir-db · 2024-09-09T14:41:37Z

Closing in favor of #151
@Muennighoff @dirkgr please feel free to comment on other PR if you need any related changes or you can open a github issue :)

Muennighoff added 3 commits June 10, 2024 23:22

Add zloss

24bae3e

Allow logging zloss

2783cd4

Merge upstream

b3c25d7

Muennighoff mentioned this pull request Aug 3, 2024

MoE allenai/OLMo#639

Open

josejg self-requested a review August 10, 2024 17:35

josejg reviewed Aug 10, 2024

View reviewed changes

megablocks/layers/moe.py Outdated Show resolved Hide resolved

Muennighoff commented Aug 10, 2024

View reviewed changes

megablocks/layers/moe.py Outdated Show resolved Hide resolved

Use torch.logsumexp

e430ad7

Muennighoff added 3 commits September 6, 2024 19:17

Merge branch 'main' into zloss

15be34e

Fix rtn type

a3a35b7

Fix rtn type

b39400b

josejg mentioned this pull request Sep 9, 2024

Implement Router Z-loss #151

Merged

7 tasks

mihir-db closed this Sep 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add zloss #133

Add zloss #133

Muennighoff commented Aug 3, 2024

dirkgr commented Sep 6, 2024

mvpatel2000 commented Sep 6, 2024

josejg commented Sep 8, 2024

Muennighoff commented Sep 8, 2024

josejg commented Sep 9, 2024

mihir-db commented Sep 9, 2024

Add zloss #133

Add zloss #133

Conversation

Muennighoff commented Aug 3, 2024

dirkgr commented Sep 6, 2024

mvpatel2000 commented Sep 6, 2024

josejg commented Sep 8, 2024

Muennighoff commented Sep 8, 2024

josejg commented Sep 9, 2024

mihir-db commented Sep 9, 2024