v0.14.3 Patch release
What's Changed
- Update version.txt after 0.14.2 release by @mrwyattii in #5458
- Add getter and setter methods for compile_backend across accelerators. by @vshekhawat-hlab in #5299
- Fix torch.compile error for PyTorch v2.3 by @tohtana in #5463
- Revert "stage3: efficient compute of scaled_global_grad_norm (#5256)" by @lekurile in #5461
- Update ds-chat CI workflow paths to include zero stage 1-3 files by @lekurile in #5462
- Update with ops not supported on Windows by @loadams in #5468
- fix: swapping order of parameters in create_dir_symlink method. by @alvieirajr in #5465
- Un-pin torch version in nv-torch-latest back to latest and skip test_compile_zero tests on v100 by @loadams in #5459
- re-introduce: stage3: efficient compute of scaled_global_grad_norm by @nelyahu in #5493
- Fix crash when creating Torch tensor on NPU with device=get_accelerator().current_device() by @harygo2 in #5464
- Fix compile wrapper by @BacharL in #5455
- enable phi3_mini autotp by @Yejing-Lai in #5501
- Fused adam for HPU by @BacharL in #5500
- [manifest] update mainfest to add hpp file in csrc. by @ys950902 in #5522
- enable phi2 autotp by @Yejing-Lai in #5436
- Switch pynvml to nvidia-ml-py by @loadams in #5529
- Switch from double quotes to match single quotes by @loadams in #5530
- [manifest] update mainfest to add hpp file in deepspeed. by @ys950902 in #5533
- New integration - CometMonitor by @alexkuzmik in #5466
- Improve _configure_optimizer() final optimizer log by @nelyahu in #5528
- Enhance testing: Skip fused_optimizer tests if not supported. by @vshekhawat-hlab in #5159
- Skip the UT cases that use unimplemented op builders. by @foin6 in #5372
- rocblas -> hipblas changes for ROCm by @rraminen in #5401
- Rocm warp size fix by @rraminen in #5402
- CPUAdam fp16 and bf16 support by @BacharL in #5409
- Optimize zero3 fetch params using all_reduce by @deepcharm in #5420
- Fix the TypeError for XPU Accelerator by @shiyang-weng in #5531
- Fix RuntimeError for moe on XPU: tensors found at least two devices by @shiyang-weng in #5519
- Remove synchronize calls from allgather params by @BacharL in #5516
- Avoid overwrite of compiled module wrapper attributes by @deepcharm in #5549
- Small typos in functions set_none_gradients_to_zero by @TravelLeraLone in #5557
- Adapt doc for #4405 by @oraluben in #5552
- Update to HF_HOME from TRANSFORMERS_CACHE by @loadams in #4816
- [INF] DSAttention allow input_mask to have false as value by @oelayan7 in #5546
- Add throughput timer configuration by @deepcharm in #5363
- Add Ulysses DistributedAttention compatibility by @Kwen-Chen in #5525
- Add hybrid_engine.py as path to trigger the DS-Chat GH workflow by @lekurile in #5562
- Update HPU docker version by @loadams in #5566
- Rename files in fp_quantize op from quantize.* to fp_quantize.* by @loadams in #5577
- [MiCS] Remove the handle print on DeepSpeed side by @ys950902 in #5574
- Update to fix sidebar over text by @loadams in #5567
- DeepSpeedCheckpoint: support custom final ln idx by @nelyahu in #5506
- Update minor CUDA version compatibility by @adk9 in #5591
- Add slide deck for meetup in Japan by @tohtana in #5598
- Fixed the Windows build. by @costin-eseanu in #5596
- estimate_zero2_model_states_mem_needs: fixing memory estiamtion by @nelyahu in #5099
- Fix cuda hardcode for inference woq by @Liangliang-Ma in #5565
- fix sequence parallel(Ulysses) grad scale for zero0 by @inkcherry in #5555
- Add Compressedbackend for Onebit optimizers by @Liangliang-Ma in #5473
- Updated hpu-gaudi2 tests content. by @vshekhawat-hlab in #5622
- Pin transformers version for MII tests by @loadams in #5629
- WA for Torch-compile-Z3-act-apt accuracy issue from the Pytorch repo by @NirSonnenschein in #5590
- stage_1_and_2: optimize clip calculation to use clamp by @nelyahu in #5632
- Fix overlap communication of ZeRO stage 1 and 2 by @penn513 in #5606
- fixes in _partition_param_sec function by @mmhab in #5613
- assumption of torch.initial_seed function accepting seed arg in DeepSpeedAccelerator abstract class is incorrect by @polisettyvarma in #5569
- pipe/_exec_backward_pass: fix immediate grad update by @nelyahu in #5605
- Monitor was always enabled causing performance degradation by @deepcharm in #5633
New Contributors
- @alvieirajr made their first contribution in #5465
- @harygo2 made their first contribution in #5464
- @alexkuzmik made their first contribution in #5466
- @foin6 made their first contribution in #5372
- @shiyang-weng made their first contribution in #5531
- @TravelLeraLone made their first contribution in #5557
- @oraluben made their first contribution in #5552
- @Kwen-Chen made their first contribution in #5525
- @adk9 made their first contribution in #5591
- @costin-eseanu made their first contribution in #5596
- @NirSonnenschein made their first contribution in #5590
- @penn513 made their first contribution in #5606
Full Changelog: v0.14.2...v0.14.3