Releases · databricks/megablocks

20 Nov 00:44

j316chuck

v0.7.0

e113487

v0.7.0 Latest

Latest

What's Changed

Bump _version.py to 0.7.0.dev0 by @eitanturok in #148
Remove deprecated torch.cuda.amp custom fwd and bwd by @snarayan21 in #150
Implement Router Z-loss by @josejg in #151
Initialize default device lazily by @janEbert in #152
Update router lint by @mihir-db in #158
Bump torch 2.5.1 and upgrade to 0.8.0.dev0 by @j316chuck in #162

New Contributors

@josejg made their first contribution in #151
@janEbert made their first contribution in #152
@mihir-db made their first contribution in #158

Full Changelog: v0.6.1...v0.7.0

Contributors

janEbert, j316chuck, and 4 other contributors

Assets 2

31 Aug 14:49

snarayan21

v0.6.1

342c297

v0.6.1

What's New

Patch release to remove dependencies specified via github and instead use released versions through pypi (specifically, stanford-stk and grouped-gemm). This allows for releasing megablocks itself via pypi.

What's Changed

Remove direct dependencies, allowing for megablocks pypi release by @snarayan21 in #149

Full Changelog: v0.6.0...v0.6.1

Contributors

snarayan21

Assets 2

30 Aug 18:55

eitanturok

v0.6.0

9243f70

v0.6.0

What's New

1. Torch 2.4 Compatibility (#145)

MegaBlocks now supports Torch 2.4!

2. New CI/CD

MegaBlocks has new Github Actions for better CI/CD! Now on every PR, MegaBlocks will automatically perform code linting and formatting (#131) and run tests on a GPU (#127).

3. Remove Weight Parallelism (#137)

Weight parallelism was not in use and so we removed it.

4. Shared Experts (#109)
Implement shared experts, based on the DeepSeekMoE paper.

Bug Fixes

Better handle incompatible ffn sizes (#108)
Fix AMP for memory optimized options (#111)
Don't save moe lb-loss tensors (#119)

What's Changed

Remove turbo by @dblalock in #96
Update README.md by @dakinggg in #98
Fix for ffn_hidden_size of 128, and better error message for incompatible ffn sizes. by @snarayan21 in #108
Add Shared Expert by @vchiley in #109
Fix AMP for memory optimized options by @mvpatel2000 in #111
bump and pin versions by @vchiley in #112
dont save moe lb-loss tensors if args.moe_loss_weight=0 by @michael-go in #119
bump by @vchiley in #116
Minor changes to batched_load_balancing_loss function by @ShashankMosaicML in #121
Migrate tests to pytest + add GA by @eitanturok in #127
Change Runner in GA by @eitanturok in #129
Clean up setup.py by @eitanturok in #128
only run GA if repo owner is Databricks by @eitanturok in #135
GA to Lint + Format MegaBlocks by @eitanturok in #131
bump ci-testing to v0.1.2 by @eitanturok in #138
remove weight parallelism by @eitanturok in #137
refactor testing by @eitanturok in #140
Type Checking by @eitanturok in #141
Bump torch to <2.4.1 by @eitanturok in #145

New Contributors

@dakinggg made their first contribution in #98
@michael-go made their first contribution in #119
@ShashankMosaicML made their first contribution in #121

Full Changelog: v0.5.1...v0.6.0

Contributors

dblalock, michael-go, and 6 other contributors

Assets 2

11 Jan 22:14

tgale96

v0.5.1

f05609c

v0.5.1

What's Changed

Update dependencies and package organization. by @tgale96 in #52
Remove errant "*" in README by @tgale96 in #54
Update Megatron-LM scripts and integration for latest Docker container. by @tgale96 in #55
Update setup.py to support multiple device capabilities by @simon-mo in #56
enable arg enabled normalization of routing weights by @vchiley in #58
More customizable norm for expert weights by @snarayan21 in #60
Update README.md by @eltociear in #63
enable custom activation functions by @vchiley in #65
Skip updating load balancing loss on eval by @sedrick-keh-tri in #69
Change router weight norm from in-place by @sashaDoubov in #70
add mem optimized grouped glu by @vchiley in #66
Add cast to tensor for DTensor inputs for groupedmlp by @eracah in #71
Dtensor to all paths by @mvpatel2000 in #73
Refactor dtesnor by @mvpatel2000 in #74
Mem opt glu bkwd by @mvpatel2000 in #72
Add dmlp registry args by @j316chuck in #75
Fix default to be sparse by @mvpatel2000 in #76
Fix moe_normalize_expert_weights when top_k=1 by @152334H in #87
Updt triton pin by @vchiley in #89

New Contributors

@simon-mo made their first contribution in #56
@snarayan21 made their first contribution in #60
@eltociear made their first contribution in #63
@sedrick-keh-tri made their first contribution in #69
@eracah made their first contribution in #71
@j316chuck made their first contribution in #75
@152334H made their first contribution in #87

Full Changelog: v0.5.0...v0.5.1

Contributors

sashaDoubov, eracah, and 9 other contributors

Assets 2

08 Dec 16:51

mvpatel2000

v0.5.0

0460181

v0.5.0

What's New

Several improvements to avoid CPU <> GPU device synchronizations, GLU support, and support for some new models 👀

What's Changed

Update version by @mvpatel2000 in #36
Avoid duplicate .cpu() call by @mvpatel2000 in #37
Have megablocks rely on torch default precision by @mvpatel2000 in #39
Add GLU support by @sashaDoubov in #38
Enable generic dimentionality for input by @vchiley in #41
Removing an extra size call by @bcui19 in #43
Fix bug in topology kernel for ffn_hidden_size>4096. by @tgale96 in #47

New Contributors

@sashaDoubov made their first contribution in #38
@bcui19 made their first contribution in #43

Full Changelog: v0.4.0...v0.5.0

Contributors

sashaDoubov, vchiley, and 3 other contributors

Assets 2

24 Oct 22:44

mvpatel2000

v0.4.0

6a71b18

v0.4.0

What's Changed

Unpack saved context once by @mvpatel2000 in #33
Refactoring class hierarchy for FSDP wrapping by @tgale96 in #34

Full Changelog: v0.3.3...v0.4.0

Contributors

tgale96 and mvpatel2000

Assets 2

17 Oct 21:58

mvpatel2000

v0.3.3

52aa1b2

v0.3.3

What's Changed

Enable running MegaBlocks MoE without bias by @vchiley in #31

Full Changelog: v0.3.2...v0.3.3

Contributors

vchiley

Assets 2

10 Oct 22:32

mvpatel2000

v0.3.2

6640ebd

v0.3.2

What's Changed

Support for bfloat16
Optimizations for top_k > 1
Support for fully-sharded data parallelism
Support tensor model parallelism when expert_parallel_world_size > num_experts
Optimizations for activation memory
Support activation quantization (thanks @dblalock!)
Optimizations for SM90 (Hopper)
Lots of bug fixes, cleanup and small optimizations

New Contributors

@vchiley made their first contribution in #9
@deepakn94 made their first contribution in #16
@b-chu made their first contribution in #19

Full Changelog: v0.1...v0.3.2

Contributors

dblalock, deepakn94, and 2 other contributors

Assets 2

01 May 15:14

tgale96

v0.1

7c5a9f3

Version 0.1 Pre-release

Pre-release

Initial release documenting repository state prior to MLSys'23 camera-ready publication.

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What's Changed

New Contributors

Contributors

What's New

What's Changed

Contributors

What's New

Bug Fixes

What's Changed

New Contributors

Contributors

What's Changed

New Contributors

Contributors

What's New

What's Changed

New Contributors

Contributors

What's Changed

Contributors

What's Changed

Contributors

What's Changed

New Contributors

Contributors

Releases: databricks/megablocks

v0.7.0

What's Changed

New Contributors

Contributors

v0.6.1

What's New

What's Changed

Contributors

v0.6.0

What's New

Bug Fixes

What's Changed

New Contributors

Contributors

v0.5.1

What's Changed

New Contributors

Contributors

v0.5.0

What's New

What's Changed

New Contributors

Contributors

v0.4.0

What's Changed

Contributors

v0.3.3

What's Changed

Contributors

v0.3.2

What's Changed

New Contributors

Contributors

Version 0.1