v0.9.3: Patch release
What's Changed
- Enable auto TP policy for llama model by @jianan-gu in #3170
- Allow users to use mis-matched CUDA versions by @mrwyattii in #3436
- Hybrid Engine Refactor and Llama Inference Support by @cmikeh2 in #3425
- add sharded checkpoint loading for AutoTP path to reduce the peak mem… by @sywangyi in #3102
- launcher/multinode_runner.py: mapping env variables by @YizhouZ in #3372
- Update automatic-tensor-parallelism.md by @sywangyi in #3198
- Build: Update license in setup by @PabloEmidio in #3484
- Doc corrections by @goodship1 in #3435
- Fix spelling errors in comments and documents by @digger-yu in #3486
- Fix spelling error in function GetMaxTokenLength() by @luliyucoordinate in #3482
- Fix a type error on bf16+Pipeline Parallelism by @ys950902 in #3441
- Fix spelling errors in DeepSpeed codebase by @digger-yu in #3494
- fix spelling error with docs/index.md by @digger-yu in #3443
- delete the line to keep user_zero_stages by @MrZhengXin in #3473
- Update Inference Engine checkpoint loading + meta tensor assertions by @lekurile in #2940
- fix regression in shard checkpoint loading in AutoTP Path caused by qkv_copy() is deleted and add UT case for shard checkpoint loading in AutoTP by @sywangyi in #3457
- Add snip_momentum structured pruning which supports higher sparse ratio by @ftian1 in #3300
- Update README.md by @goodship1 in #3504
- Hybrid Engine Fix Llama by @lekurile in #3505
- fix spelling error with deepspeed/runtime/ by @digger-yu in #3509
- Skip autoTP if tp_size is 1 by @molly-smith in #3449
- Changing monitor loss to aggregate loss over gradient accumulation steps by @jomayeri in #3428
- change actions/checkout@v2 to v3 by @digger-yu in #3526
- fix typo with docs/ by @digger-yu in #3523
- Doc updates by @goodship1 in #3520
- Fix bug in Hybrid Engine by @mrwyattii in #3497
- Fix wrong passing of offload_optimizer_config to DeepSpeedZeRoOffload by @mmhab in #3420
- Fix broadcast error on multi-node training with ZeroStage3 and TensorParallel=2 by @YizhouZ in #2999
- share inflight registry between PartitionedParameterCoordinators by @HeyangQin in #3462
- Syncing FusedAdam with new Apex features by @jomayeri in #3434
- fix typo in comments with deepspeed/ by @digger-yu in #3537
- [ROCm] Hip headers fix by @rraminen in #3532
- [CPU] Support Intel CPU inference by @delock in #3041
- Clone tensors to avoid torch.save bloat by @tjruwase in #3348
- Fix attribute error when loading FusedAdamBuilder() by @rraminen in #3527
- fix typo by @inkcherry in #3559
- Fixing bf16 test by @jomayeri in #3551
- Fix Hybrid Engine for BLOOM by @lekurile in #3580
- Fix op_builder against PyTorch nightly by @malfet in #3596
- data efficiency bug fix, avoid invalid range step size by @conglongli in #3609
- DS init should not broadcast or move zero.Init models by @tjruwase in #3611
- Expose Consecutive Hysteresis to Users by @Quentin-Anthony in #3553
- Align InferenceEngine to store ms in _model_times by @HolyFalafel in #3501
- AISC launcher fixes by @jeffra in #3637
- stage3.py: do not scale if gradient_predivide_factor is 1.0 by @guoyejun in #3630
- Add Ascend NPU accelerator support by @CurryRice233 in #3595
- Skip tests on docs-only changes by @mrwyattii in #3651
- Update megatron.md by @wjessup in #3641
- Typo Correction by @MicahZoltu in #3621
- deepspeed/comm/comm.py: fix typo of warning message by @guoyejun in #3636
- Fix RuntimeError when using ZeRO Stage3 with mpu: #3564 by @eggiter in #3565
- Allow dict datatype for checkpoints (inference) by @mrwyattii in #3007
- fix typo with deepspeed/ by @digger-yu in #3547
- flops_profiler: add option recompute_fwd_factor for the case of activation c… by @guoyejun in #3362
- fix typo deepspeed/runtime by @digger-yu in #3663
- Refactor check_enabled root validator in DeepSpeedMonitorConfig by @bgr8 in #3616
New Contributors
- @jianan-gu made their first contribution in #3170
- @YizhouZ made their first contribution in #3372
- @PabloEmidio made their first contribution in #3484
- @luliyucoordinate made their first contribution in #3482
- @ys950902 made their first contribution in #3441
- @MrZhengXin made their first contribution in #3473
- @ftian1 made their first contribution in #3300
- @mmhab made their first contribution in #3420
- @malfet made their first contribution in #3596
- @HolyFalafel made their first contribution in #3501
- @CurryRice233 made their first contribution in #3595
- @wjessup made their first contribution in #3641
- @MicahZoltu made their first contribution in #3621
- @eggiter made their first contribution in #3565
- @bgr8 made their first contribution in #3616
Full Changelog: v0.9.2...v0.9.3