Skip to content

Commit

Permalink
[bugfix] promote state in bf16_optimizer (#5767)
Browse files Browse the repository at this point in the history
This patch is to promote state in bf16_optimizer so it can be accessible
in downstream deepspeed usecase.

For example, without the patch, we found issue in megatron-deepspeed
llama showcase:
```
[rank3]: Traceback (most recent call last):                                                                                                                             
[rank3]:   File "/yahao/Megatron-DeepSpeed/pretrain_gpt.py", line 356, in <module>                                                                                      
[rank3]:     pretrain(train_valid_test_datasets_provider,                                                                                                               
[rank3]:   File "/yahao/Megatron-DeepSpeed/megatron/training.py", line 222, in pretrain                                                                                 
[rank3]:     iteration = train(forward_step_func,                                                                                                                       
[rank3]:   File "/yahao/Megatron-DeepSpeed/megatron/training.py", line 1264, in train                                                                                   
[rank3]:     report_memory_flag = training_log(loss_dict, total_loss_dict,                                                                                              
[rank3]:   File "/yahao/Megatron-DeepSpeed/megatron/training.py", line 999, in training_log                                                                             
[rank3]:     opt_stats[0] += (torch.norm(optimizer.state[param]['exp_avg_sq']).item())**2                                                                               
[rank3]: AttributeError: 'BF16_Optimizer' object has no attribute 'state'
```

With the patch, the invocation can pass smoothly.

Co-authored-by: Logan Adams <[email protected]>
  • Loading branch information
billishyahao and loadams authored Jul 16, 2024
1 parent 61e0778 commit 98272d1
Showing 1 changed file with 5 additions and 0 deletions.
5 changes: 5 additions & 0 deletions deepspeed/runtime/bf16_optimizer.py
Original file line number Diff line number Diff line change
Expand Up @@ -540,6 +540,11 @@ def param_groups(self):
"""Forward the wrapped optimizer's parameters."""
return self.optimizer.param_groups

@property
def state(self):
"""Forward the wrapped optimizer's states."""
return self.optimizer.state

def accumulate_hp_grads_and_remove_lp(self, lp_param, group_idx, param_idx):
assert self.immediate_grad_update
self._update_hp_grad(lp_param, group_idx, param_idx, clear_lp_grads=True)
Expand Down

0 comments on commit 98272d1

Please sign in to comment.