Skip to content

v0.3.2

Compare
Choose a tag to compare
@mvpatel2000 mvpatel2000 released this 10 Oct 22:32
· 98 commits to main since this release

What's Changed

  • Support for bfloat16
  • Optimizations for top_k > 1
  • Support for fully-sharded data parallelism
  • Support tensor model parallelism when expert_parallel_world_size > num_experts
  • Optimizations for activation memory
  • Support activation quantization (thanks @dblalock!)
  • Optimizations for SM90 (Hopper)
  • Lots of bug fixes, cleanup and small optimizations

New Contributors

Full Changelog: v0.1...v0.3.2