Release v0.14.1 Patch release · microsoft/DeepSpeed

What's Changed

Update version.txt after 0.14.0 release by @mrwyattii in #5238
Fp6 blog chinese by @xiaoxiawu-microsoft in #5239
Add contributed HW support into README by @delock in #5240
Set tp world size to 1 in ckpt load, if MPU is not provided by @samadejacobs in #5243
Make op builder detection adapt to accelerator change by @delock in #5206
Replace HIP_PLATFORM_HCC with HIP_PLATFORM_AMD by @rraminen in #5264
Add CI for Habana Labs HPU/Gaudi2 by @loadams in #5244
Fix attention mask handling in the Hybrid Engine Bloom flow by @deepcharm in #5101
Skip 1Bit Compression and sparsegrad tests for HPU. by @vshekhawat-hlab in #5270
Enabled LMCorrectness inference tests on HPU. by @vshekhawat-hlab in #5271
Added HPU backend support for torch.compile tests. by @vshekhawat-hlab in #5269
Average only valid part of the ipg buffer. by @BacharL in #5268
Add HPU accelerator support in unit tests. by @vshekhawat-hlab in #5162
Fix loading a universal checkpoint by @tohtana in #5263
Add Habana Gaudi2 CI badge to the README by @loadams in #5286
Add intel gaudi to contributed HW in README by @BacharL in #5300
Fixed Accelerate Link by @wkaisertexas in #5314
Enable mixtral 8x7b autotp by @Yejing-Lai in #5257
support bf16_optimizer moe expert parallel training and moe EP grad_scale/grad_norm fix by @inkcherry in #5259
fix comms dtype by @mayank31398 in #5297
Modified regular expression by @igeni in #5306
Docs typos fix and grammar suggestions by @Gr0g0 in #5322
Added Gaudi2 CI tests. by @vshekhawat-hlab in #5275
Improve universal checkpoint by @tohtana in #5289
Increase coverage for HPU by @loadams in #5324
Add NFS path check for default deepspeed triton cache directory by @HeyangQin in #5323
Correct typo in checking on bf16 unit test support by @loadams in #5317
Make NFS warning print only once by @HeyangQin in #5345
resolve KeyError: 'PDSH_SSH_ARGS_APPEND' by @Lzhang-hub in #5318
BF16 optimizer: Clear lp grads after updating hp grads in hook by @YangQun1 in #5328
Fix sort of zero checkpoint files by @tohtana in #5342
Add distributed_port for deepspeed.initialize by @LZHgrla in #5260
[fix] fix typo s/simultanenously /simultaneously by @digger-yu in #5359
Update container version for Gaudi2 CI by @raza-sikander in #5360
compute global norm on device by @BacharL in #5125
logger update with torch master changes by @rogerxfeng8 in #5346
Ensure capacity does not exceed number of tokens by @jeffra in #5353
Update workflows that use cu116 to cu117 by @loadams in #5361
FP [6,8,12] quantizer op by @jeffra in #5336
CPU SHM based inference_all_reduce improve by @delock in #5320
Auto convert moe param groups by @jeffra in #5354
Support MoE for pipeline models by @mosheisland in #5338
Update pytest and transformers with fixes for pytest>= 8.0.0 by @loadams in #5164
Increase CI coverage for Gaudi2 accelerator. by @vshekhawat-hlab in #5358
Add CI for Intel XPU/Max1100 by @Liangliang-Ma in #5376
Update path name on xpu-max1100.yml, add badge in README by @loadams in #5386
Update checkout action on workflows on ubuntu 20.04 by @loadams in #5387
Cleanup required_torch_version code and references. by @loadams in #5370
Update README.md for intel XPU support by @Liangliang-Ma in #5389
Optimize the fp-dequantizer to get high memory-BW utilization by @RezaYazdaniAminabadi in #5373
Removal of cuda hardcoded string with get_device function by @raza-sikander in #5351
Add custom reshaping for universal checkpoint by @tohtana in #5390
fix pagable h2d memcpy by @GuanhuaWang in #5301
stage3: efficient compute of scaled_global_grad_norm by @nelyahu in #5256
Fix the FP6 kernels compilation problem on non-Ampere GPUs. by @JamesTheZ in #5333

New Contributors

@vshekhawat-hlab made their first contribution in #5270
@wkaisertexas made their first contribution in #5314
@igeni made their first contribution in #5306
@Gr0g0 made their first contribution in #5322
@Lzhang-hub made their first contribution in #5318
@YangQun1 made their first contribution in #5328
@raza-sikander made their first contribution in #5360
@rogerxfeng8 made their first contribution in #5346
@JamesTheZ made their first contribution in #5333

Full Changelog: v0.14.0...v0.14.1

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.14.1 Patch release

What's Changed

New Contributors

Contributors