Skip to content

v0.2.0

Compare
Choose a tag to compare
@laekov laekov released this 31 May 08:27
· 191 commits to master since this release
c96f886

Load balancing

  • A brand new gate module with capacity-related utilities.
  • GShard's and Switch Transformer's balance strategies are implemented as integrated gates.
  • Balance loss is enabled.
  • Balance monitor is provided.

Checkpointing

  • MoE models can be loaded and saved by fmoe's checkpointing module.

Performance

  • FP16 training performance is improved.

Misc

  • CUDA code directory is reconstructed.
  • More tests are added.