v0.14.1 Patch release
What's Changed
- Update version.txt after 0.14.0 release by @mrwyattii in #5238
- Fp6 blog chinese by @xiaoxiawu-microsoft in #5239
- Add contributed HW support into README by @delock in #5240
- Set tp world size to 1 in ckpt load, if MPU is not provided by @samadejacobs in #5243
- Make op builder detection adapt to accelerator change by @delock in #5206
- Replace HIP_PLATFORM_HCC with HIP_PLATFORM_AMD by @rraminen in #5264
- Add CI for Habana Labs HPU/Gaudi2 by @loadams in #5244
- Fix attention mask handling in the Hybrid Engine Bloom flow by @deepcharm in #5101
- Skip 1Bit Compression and sparsegrad tests for HPU. by @vshekhawat-hlab in #5270
- Enabled LMCorrectness inference tests on HPU. by @vshekhawat-hlab in #5271
- Added HPU backend support for torch.compile tests. by @vshekhawat-hlab in #5269
- Average only valid part of the ipg buffer. by @BacharL in #5268
- Add HPU accelerator support in unit tests. by @vshekhawat-hlab in #5162
- Fix loading a universal checkpoint by @tohtana in #5263
- Add Habana Gaudi2 CI badge to the README by @loadams in #5286
- Add intel gaudi to contributed HW in README by @BacharL in #5300
- Fixed Accelerate Link by @wkaisertexas in #5314
- Enable mixtral 8x7b autotp by @Yejing-Lai in #5257
- support bf16_optimizer moe expert parallel training and moe EP grad_scale/grad_norm fix by @inkcherry in #5259
- fix comms dtype by @mayank31398 in #5297
- Modified regular expression by @igeni in #5306
- Docs typos fix and grammar suggestions by @Gr0g0 in #5322
- Added Gaudi2 CI tests. by @vshekhawat-hlab in #5275
- Improve universal checkpoint by @tohtana in #5289
- Increase coverage for HPU by @loadams in #5324
- Add NFS path check for default deepspeed triton cache directory by @HeyangQin in #5323
- Correct typo in checking on bf16 unit test support by @loadams in #5317
- Make NFS warning print only once by @HeyangQin in #5345
- resolve KeyError: 'PDSH_SSH_ARGS_APPEND' by @Lzhang-hub in #5318
- BF16 optimizer: Clear lp grads after updating hp grads in hook by @YangQun1 in #5328
- Fix sort of zero checkpoint files by @tohtana in #5342
- Add
distributed_port
fordeepspeed.initialize
by @LZHgrla in #5260 - [fix] fix typo s/simultanenously /simultaneously by @digger-yu in #5359
- Update container version for Gaudi2 CI by @raza-sikander in #5360
- compute global norm on device by @BacharL in #5125
- logger update with torch master changes by @rogerxfeng8 in #5346
- Ensure capacity does not exceed number of tokens by @jeffra in #5353
- Update workflows that use cu116 to cu117 by @loadams in #5361
- FP [6,8,12] quantizer op by @jeffra in #5336
- CPU SHM based inference_all_reduce improve by @delock in #5320
- Auto convert moe param groups by @jeffra in #5354
- Support MoE for pipeline models by @mosheisland in #5338
- Update pytest and transformers with fixes for pytest>= 8.0.0 by @loadams in #5164
- Increase CI coverage for Gaudi2 accelerator. by @vshekhawat-hlab in #5358
- Add CI for Intel XPU/Max1100 by @Liangliang-Ma in #5376
- Update path name on xpu-max1100.yml, add badge in README by @loadams in #5386
- Update checkout action on workflows on ubuntu 20.04 by @loadams in #5387
- Cleanup required_torch_version code and references. by @loadams in #5370
- Update README.md for intel XPU support by @Liangliang-Ma in #5389
- Optimize the fp-dequantizer to get high memory-BW utilization by @RezaYazdaniAminabadi in #5373
- Removal of cuda hardcoded string with get_device function by @raza-sikander in #5351
- Add custom reshaping for universal checkpoint by @tohtana in #5390
- fix pagable h2d memcpy by @GuanhuaWang in #5301
- stage3: efficient compute of scaled_global_grad_norm by @nelyahu in #5256
- Fix the FP6 kernels compilation problem on non-Ampere GPUs. by @JamesTheZ in #5333
New Contributors
- @vshekhawat-hlab made their first contribution in #5270
- @wkaisertexas made their first contribution in #5314
- @igeni made their first contribution in #5306
- @Gr0g0 made their first contribution in #5322
- @Lzhang-hub made their first contribution in #5318
- @YangQun1 made their first contribution in #5328
- @raza-sikander made their first contribution in #5360
- @rogerxfeng8 made their first contribution in #5346
- @JamesTheZ made their first contribution in #5333
Full Changelog: v0.14.0...v0.14.1