Release v0.14.3 Patch release · microsoft/DeepSpeed

What's Changed

Update version.txt after 0.14.2 release by @mrwyattii in #5458
Add getter and setter methods for compile_backend across accelerators. by @vshekhawat-hlab in #5299
Fix torch.compile error for PyTorch v2.3 by @tohtana in #5463
Revert "stage3: efficient compute of scaled_global_grad_norm (#5256)" by @lekurile in #5461
Update ds-chat CI workflow paths to include zero stage 1-3 files by @lekurile in #5462
Update with ops not supported on Windows by @loadams in #5468
fix: swapping order of parameters in create_dir_symlink method. by @alvieirajr in #5465
Un-pin torch version in nv-torch-latest back to latest and skip test_compile_zero tests on v100 by @loadams in #5459
re-introduce: stage3: efficient compute of scaled_global_grad_norm by @nelyahu in #5493
Fix crash when creating Torch tensor on NPU with device=get_accelerator().current_device() by @harygo2 in #5464
Fix compile wrapper by @BacharL in #5455
enable phi3_mini autotp by @Yejing-Lai in #5501
Fused adam for HPU by @BacharL in #5500
[manifest] update mainfest to add hpp file in csrc. by @ys950902 in #5522
enable phi2 autotp by @Yejing-Lai in #5436
Switch pynvml to nvidia-ml-py by @loadams in #5529
Switch from double quotes to match single quotes by @loadams in #5530
[manifest] update mainfest to add hpp file in deepspeed. by @ys950902 in #5533
New integration - CometMonitor by @alexkuzmik in #5466
Improve _configure_optimizer() final optimizer log by @nelyahu in #5528
Enhance testing: Skip fused_optimizer tests if not supported. by @vshekhawat-hlab in #5159
Skip the UT cases that use unimplemented op builders. by @foin6 in #5372
rocblas -> hipblas changes for ROCm by @rraminen in #5401
Rocm warp size fix by @rraminen in #5402
CPUAdam fp16 and bf16 support by @BacharL in #5409
Optimize zero3 fetch params using all_reduce by @deepcharm in #5420
Fix the TypeError for XPU Accelerator by @shiyang-weng in #5531
Fix RuntimeError for moe on XPU: tensors found at least two devices by @shiyang-weng in #5519
Remove synchronize calls from allgather params by @BacharL in #5516
Avoid overwrite of compiled module wrapper attributes by @deepcharm in #5549
Small typos in functions set_none_gradients_to_zero by @TravelLeraLone in #5557
Adapt doc for #4405 by @oraluben in #5552
Update to HF_HOME from TRANSFORMERS_CACHE by @loadams in #4816
[INF] DSAttention allow input_mask to have false as value by @oelayan7 in #5546
Add throughput timer configuration by @deepcharm in #5363
Add Ulysses DistributedAttention compatibility by @Kwen-Chen in #5525
Add hybrid_engine.py as path to trigger the DS-Chat GH workflow by @lekurile in #5562
Update HPU docker version by @loadams in #5566
Rename files in fp_quantize op from quantize.* to fp_quantize.* by @loadams in #5577
[MiCS] Remove the handle print on DeepSpeed side by @ys950902 in #5574
Update to fix sidebar over text by @loadams in #5567
DeepSpeedCheckpoint: support custom final ln idx by @nelyahu in #5506
Update minor CUDA version compatibility by @adk9 in #5591
Add slide deck for meetup in Japan by @tohtana in #5598
Fixed the Windows build. by @costin-eseanu in #5596
estimate_zero2_model_states_mem_needs: fixing memory estiamtion by @nelyahu in #5099
Fix cuda hardcode for inference woq by @Liangliang-Ma in #5565
fix sequence parallel(Ulysses) grad scale for zero0 by @inkcherry in #5555
Add Compressedbackend for Onebit optimizers by @Liangliang-Ma in #5473
Updated hpu-gaudi2 tests content. by @vshekhawat-hlab in #5622
Pin transformers version for MII tests by @loadams in #5629
WA for Torch-compile-Z3-act-apt accuracy issue from the Pytorch repo by @NirSonnenschein in #5590
stage_1_and_2: optimize clip calculation to use clamp by @nelyahu in #5632
Fix overlap communication of ZeRO stage 1 and 2 by @penn513 in #5606
fixes in _partition_param_sec function by @mmhab in #5613
assumption of torch.initial_seed function accepting seed arg in DeepSpeedAccelerator abstract class is incorrect by @polisettyvarma in #5569
pipe/_exec_backward_pass: fix immediate grad update by @nelyahu in #5605
Monitor was always enabled causing performance degradation by @deepcharm in #5633

New Contributors

@alvieirajr made their first contribution in #5465
@harygo2 made their first contribution in #5464
@alexkuzmik made their first contribution in #5466
@foin6 made their first contribution in #5372
@shiyang-weng made their first contribution in #5531
@TravelLeraLone made their first contribution in #5557
@oraluben made their first contribution in #5552
@Kwen-Chen made their first contribution in #5525
@adk9 made their first contribution in #5591
@costin-eseanu made their first contribution in #5596
@NirSonnenschein made their first contribution in #5590
@penn513 made their first contribution in #5606

Full Changelog: v0.14.2...v0.14.3

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v0.14.3 Patch release

What's Changed

New Contributors

Contributors