Fix allreduce for BF16 and ZeRO0 #5170

tohtana · 2024-02-21T16:56:39Z

This PR fixes an issue with allreducing for ZeRO0 + BF16. (This replaces #5154)

DeepSpeed uses BF16_Optimizer when ZeRO0 and BF16 are enabled. The optimizer accumulates gradients on FP32 buffer soon after a backward pass completes. However, DeepSpeed engine performs allreduce on BF16 gradients.

This PR fixes the issue by performing allreduce on the FP32 buffer. It also eliminates an assertion that prohibits BF16+PP+Z1, which is actually runnable.

This shows loss curves of the following conditions:

BF16/Z0,Z1,Z2,Z3/NoPP
BF16/Z0,Z1/PP(2 stages)
(all used 8GPUs, gradient accumulation step: 4)

This PR fixes an issue with allreducing for ZeRO0 + BF16. (This replaces microsoft#5154) DeepSpeed uses `BF16_Optimizer` when ZeRO0 and BF16 are enabled. The optimizer accumulates gradients on FP32 buffer soon after a backward pass completes. However, DeepSpeed engine performs allreduce on BF16 gradients. This PR fixes the issue by performing allreduce on the FP32 buffer. It also eliminates an assertion that prohibits BF16+PP+Z1, which is actually runnable. This shows loss curves of the following conditions: - BF16/Z0,Z1,Z2,Z3/NoPP - BF16/Z0,Z1/PP(2 stages) (all used 8GPUs, gradient accumulation step: 4) ![image](https://github.com/microsoft/DeepSpeed/assets/81312776/0dc1e9ef-43bc-4b47-8b9e-d6aca137a217) --------- Co-authored-by: Logan Adams <[email protected]>

tohtana and others added 4 commits February 19, 2024 02:46

fix gradient clipping

794e992

Merge branch 'master' into tohtana/fix_fp32_clipping

9871421

perform allreduce on FP32 when BF16 optimizer is enabled

84ced5b

ZZMerge branch 'master' into tohtana/fix_bf16_z0_reduce

fde1df0

tohtana mentioned this pull request Feb 21, 2024

ZeRO0 does not handle BF16 gradients properly #5154

Closed

tjruwase approved these changes Feb 21, 2024

View reviewed changes

tohtana changed the title ~~Tohtana/fix bf16 z0 reduce~~ Fix allreduce for BF16 and ZeRO0 Feb 21, 2024

tohtana marked this pull request as ready for review February 21, 2024 17:30

tohtana requested a review from mrwyattii as a code owner February 21, 2024 17:30

Merge branch 'master' into tohtana/fix_bf16_z0_reduce

23df2cf

tohtana enabled auto-merge February 21, 2024 18:10

tohtana added this pull request to the merge queue Feb 21, 2024

Merged via the queue into master with commit dd3690c Feb 21, 2024
12 checks passed

tohtana deleted the tohtana/fix_bf16_z0_reduce branch February 21, 2024 20:08

ys950902 mentioned this pull request Apr 17, 2024

[zero1+pp]Remove the config which is not needed intel/intel-extension-for-deepspeed#69

Merged

tohtana mentioned this pull request Apr 17, 2024

Comparison of Deepspeed Stage 1,2 and 3 vs DDP #4815

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix allreduce for BF16 and ZeRO0 #5170

Fix allreduce for BF16 and ZeRO0 #5170

tohtana commented Feb 21, 2024

Fix allreduce for BF16 and ZeRO0 #5170

Fix allreduce for BF16 and ZeRO0 #5170

Conversation

tohtana commented Feb 21, 2024