Releases · laekov/fastmoe

08 Oct 02:54

laekov

v1.1.0

5b60285

v1.1.0 Latest

Latest

Performance

Smart schedule of FasterMoE is updated with correct stream management, and becomes faster.

Testing

All unit tests are checked and they run correctly now.

Adaption

Megatron-LM 3.2 supported.

Documentation

README is updated with some bugs fixed.
A detailed document for process groups.

Assets 2

30 May 06:16

laekov

v1.0.1

c9ccc0e

v1.0.1

Compatibility

PyTorch 2.0 supported.
Megatron-LM 2.5 supported.

Documentation

A detailed [installation guide](installation-guide.md thanks to @santurini

Performance related

Generalize FasterMoE's schedule to n_expert > 1 and more bug fixes.
Synchronization reduction thanks to @Fragile-azalea

Contributors

Fragile-azalea and santurini

Assets 2

02 Apr 03:11

laekov

v1.0.0

59bcec8

v1.0.0

FasterMoE

The new performance boosting features in the PPoPP'22 paper FasterMoE, detailed in the document.
- Expert Shadowing.
- Smart Scheduling.
- Topology-aware gate.

Bug fixes

Transformer-XL examples.
Compatibility to PyTorch versions.
Megatron-LM documents.
GShardGate.

Assets 2

08 Nov 09:37

laekov

v0.3.0

acf8bec

v0.3.0

FMoE core

Previous mp_group is renamed to slice_group, indicating that all workers in the group receive the same input batch, and process a slice of the input. mp_group will be deprecated in our next release.
ROCm supported.
FMoELinear is moved to a stand-alone file.

Groupped data parallel

Support any group name by their relative tag name.

Load balancing

A brand new balancing strategy - SWIPE. Contributed by authors of a (currently unpublished) paper.
A property has_loss is added to each gate, in order to identify whether balance loss should be collected.

Megatron-LM support

Experts are partitioned by tensor model parallelism in mp_group, instead of expert parallelism.
Support arbitrary customized gate in MegatronMLP.
Move the patches to a stand-alone file.

Tests

Move util functions into test_ddp.py.

Assets 2

23 Aug 08:28

laekov

v0.2.1

d2392de

v0.2.1

Load balancing

Fix gradient for balance loss.

Misc

Typos.
Update benchmark interface.
Remove some redundant code for performance improvement.
Enable USE_NCCL by default.
Compatibility for PyTorch <1.8.0 and >=1.8.0.

Megatron adaption

Patch for numerical correctness of gradient clipping.
Support to pipeline parallelism.

Assets 2

31 May 08:27

laekov

v0.2.0

c96f886

v0.2.0

Load balancing

A brand new gate module with capacity-related utilities.
GShard's and Switch Transformer's balance strategies are implemented as integrated gates.
Balance loss is enabled.
Balance monitor is provided.

Checkpointing

MoE models can be loaded and saved by fmoe's checkpointing module.

Performance

FP16 training performance is improved.

Misc

CUDA code directory is reconstructed.
More tests are added.

Assets 2

13 Mar 10:17

laekov

v0.1.2

a3b2eb6

v0.1.2

Compilation

Remove dependency on the CUDA examples repository.

Distributed

Fix a bug related to PyTorch v1.8.0. FastMoE can now operate on multiple GPUs
on multiple nodes with PyTorch v1.8.0.

Misc

Fix tons of typos.
Format the code.

Assets 2

01 Mar 06:51

laekov

v0.1.1

0c3aa2c

v0.1.1

First public release with basic distributed MoE functions, tested with Megatron-LM and Transformer-XL.

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance

Testing

Adaption

Documentation

Compatibility

Documentation

Performance related

Contributors

FasterMoE

Bug fixes

FMoE core

Groupped data parallel

Load balancing

Megatron-LM support

Tests

Load balancing

Misc

Megatron adaption

Load balancing

Checkpointing

Performance

Misc

Compilation

Distributed

Misc

Releases: laekov/fastmoe

v1.1.0

Performance

Testing

Adaption

Documentation

v1.0.1

Compatibility

Documentation

Performance related

Contributors

v1.0.0

FasterMoE

Bug fixes

v0.3.0

FMoE core

Groupped data parallel

Load balancing

Megatron-LM support

Tests

v0.2.1

Load balancing

Misc

Megatron adaption

v0.2.0

Load balancing

Checkpointing

Performance

Misc

v0.1.2

Compilation

Distributed

Misc

v0.1.1