Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[bugfix] promote state in bf16_optimizer (#5767)
This patch is to promote state in bf16_optimizer so it can be accessible in downstream deepspeed usecase. For example, without the patch, we found issue in megatron-deepspeed llama showcase: ``` [rank3]: Traceback (most recent call last): [rank3]: File "/yahao/Megatron-DeepSpeed/pretrain_gpt.py", line 356, in <module> [rank3]: pretrain(train_valid_test_datasets_provider, [rank3]: File "/yahao/Megatron-DeepSpeed/megatron/training.py", line 222, in pretrain [rank3]: iteration = train(forward_step_func, [rank3]: File "/yahao/Megatron-DeepSpeed/megatron/training.py", line 1264, in train [rank3]: report_memory_flag = training_log(loss_dict, total_loss_dict, [rank3]: File "/yahao/Megatron-DeepSpeed/megatron/training.py", line 999, in training_log [rank3]: opt_stats[0] += (torch.norm(optimizer.state[param]['exp_avg_sq']).item())**2 [rank3]: AttributeError: 'BF16_Optimizer' object has no attribute 'state' ``` With the patch, the invocation can pass smoothly. Co-authored-by: Logan Adams <[email protected]>
- Loading branch information