-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG]: Parameter model.norm.weight
failed at the gradient reduction.
#4984
Comments
I cannot see the exact issue from your description. But the error should be raised at line. |
Would you please let me know your |
@JThh I don't use any config file. I just start it with:
|
Anyone could help? I search here and find many similar problems without answer. I also tried llama2 example. Because this issue, I have to use transformers 4.33.3 and I can run script/gemini_auto.sh with 70b correctly. So the difference between examples/language/llama2 and application/Colossal-LLaMA-2 may be related to the bug. |
Hi @fancyerii , I found that this bug was triggered by the def replace_with_flash_attention(model: LlamaForCausalLM) -> None:
for name, module in model.named_modules():
if isinstance(module, LlamaAttention):
module.forward = MethodType(attention_forward, module)
if isinstance(module, LlamaModel):
module._prepare_decoder_attention_mask = MethodType(_prepare_decoder_attention_mask, module)
# if isinstance(module, LlamaRMSNorm):
# module.forward = MethodType(rms_norm_forward, module) and the bug might be fixed. |
Thanks. I also found the problem and tested replace_xformers. It runs correctly now. But I have to switch back to transformers==4.33.3. |
Currently it's just a workaround, the mechanism behind is complex and we are still finding a better solution. Thank you for your issue! |
By the way, what's the difference of replace_xformers and replace_with_flash_attention? |
|
Thanks. |
Thanks. |
@Fridge003 if LlamaRMSNorm is not used, then no RMSNorm is used? Will the training be correct? |
LlamaRMSNorm is still used, but not replaced to the flash attention version. The training will be correct, but a little bit slower. |
I got it. So when I use other plugin such as zero_cpu, I can still use the original codes. If I want to use gemini or gemini_auto, then I need comment out them. |
Currently yes, we will try to fix this later. |
🐛 Describe the bug
I am running the Colossal-LLaMA-2 example. I can run it with zero2_cpu. But when I switch to gemini or gemini_auto, it crashed with:
My launch script:
Environment
cuda 11.7
cudnn 8.9.4.25_cuda11
python 3.9
pytorch 1.13.1+cu117
colossal 0.3.3
The text was updated successfully, but these errors were encountered: