v0.3.2
What's Changed
- Support for bfloat16
- Optimizations for top_k > 1
- Support for fully-sharded data parallelism
- Support tensor model parallelism when expert_parallel_world_size > num_experts
- Optimizations for activation memory
- Support activation quantization (thanks @dblalock!)
- Optimizations for SM90 (Hopper)
- Lots of bug fixes, cleanup and small optimizations
New Contributors
- @vchiley made their first contribution in #9
- @deepakn94 made their first contribution in #16
- @b-chu made their first contribution in #19
Full Changelog: v0.1...v0.3.2