[bugfix] promote state in bf16_optimizer (#5767) · microsoft/DeepSpeed@98272d1

Commit

[bugfix] promote state in bf16_optimizer (#5767)

This patch is to promote state in bf16_optimizer so it can be accessible
in downstream deepspeed usecase.

For example, without the patch, we found issue in megatron-deepspeed
llama showcase:
```
[rank3]: Traceback (most recent call last):                                                                                                                             
[rank3]:   File "/yahao/Megatron-DeepSpeed/pretrain_gpt.py", line 356, in <module>                                                                                      
[rank3]:     pretrain(train_valid_test_datasets_provider,                                                                                                               
[rank3]:   File "/yahao/Megatron-DeepSpeed/megatron/training.py", line 222, in pretrain                                                                                 
[rank3]:     iteration = train(forward_step_func,                                                                                                                       
[rank3]:   File "/yahao/Megatron-DeepSpeed/megatron/training.py", line 1264, in train                                                                                   
[rank3]:     report_memory_flag = training_log(loss_dict, total_loss_dict,                                                                                              
[rank3]:   File "/yahao/Megatron-DeepSpeed/megatron/training.py", line 999, in training_log                                                                             
[rank3]:     opt_stats[0] += (torch.norm(optimizer.state[param]['exp_avg_sq']).item())**2                                                                               
[rank3]: AttributeError: 'BF16_Optimizer' object has no attribute 'state'
```

With the patch, the invocation can pass smoothly.

Co-authored-by: Logan Adams <[email protected]>

Loading branch information

billishyahao and loadams authored Jul 16, 2024

1 parent 61e0778 commit 98272d1

deepspeed/runtime/bf16_optimizer.py

-Original file line number
+Diff line change
@@ Expand Up / @@ -540,6 +540,11 @@ def param_groups(self): @@
             """Forward the wrapped optimizer's parameters."""
             return self.optimizer.param_groups
+        @property
+        def state(self):
+            """Forward the wrapped optimizer's states."""
+            return self.optimizer.state
         def accumulate_hp_grads_and_remove_lp(self, lp_param, group_idx, param_idx):
             assert self.immediate_grad_update
             self._update_hp_grad(lp_param, group_idx, param_idx, clear_lp_grads=True)
@@ Expand Down @@

0 comments on commit `98272d1`

Please sign in to comment.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Commit

There are no files selected for viewing

0 comments on commit `98272d1`

Commit

There are no files selected for viewing

0 comments on commit 98272d1

0 comments on commit `98272d1`