-
Notifications
You must be signed in to change notification settings - Fork 62
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
llama_self_extend_patch_4_36 is not work #23
Comments
We found that after 4.36, the default attention of llama is changed from "LlamaAttention" to "LlamaSdpaAttention". Hence the replacement function will not work. Instead, you may try: modify_method_of_instance(base_model, "LlamaAttention", "forward", self_extend_forward) This might be the reason for the failure. |
it work, thank you. |
Hi YL-9! Could you please test whether self-extend can work by instance wise modification, like the example we provide? Sometimes, direct modification to the transformers' class does not take effect, while the cause of failure is case by case. That's the reason why we choose to modify the forward function of a model instance rather than its class. (Of course,, this can avoid any unexpected behavior for the modification only happens to the specific model instance) |
ok, thank you! |
Hi, thanks for the nice work! I see the current implementation in |
Hi, thank you for your interests. The main different between transformers==4.36 and transformers==4.38.2 is how the RoPE is applied to KV. You may have a check. The computation of self attention is nearly the same. This means you can follow our 4.38.2 implementation to have a flash attention implementation for 4.36 with minor modification. One of the possible issues is the flash_attn version used by 4.36. In that case, you may use our triton flash attention implementation instead of flash_attn. It's 10~20% slower than flash_attn. |
when I use 4.36.2, it's not work. But if I use 4.32.0, it's work.
I only changed "import llama_self_extend_patch as LlamaSE" in "llama_example.py" to "import llama_self_extend_patch_4_36 as LlamaSE"
The text was updated successfully, but these errors were encountered: