llama_self_extend_patch_4_36 is not work #23

YL-9 · 2024-03-08T07:15:55Z

when I use 4.36.2, it's not work. But if I use 4.32.0, it's work.
I only changed "import llama_self_extend_patch as LlamaSE" in "llama_example.py" to "import llama_self_extend_patch_4_36 as LlamaSE"

Mooler0410 · 2024-03-09T03:45:35Z

We found that after 4.36, the default attention of llama is changed from "LlamaAttention" to "LlamaSdpaAttention". Hence the replacement function will not work. Instead, you may try:

modify_method_of_instance(base_model, "LlamaAttention", "forward", self_extend_forward)
--> modify_method_of_instance(base_model, "LlamaSdpaAttention", "forward", self_extend_forward)

This might be the reason for the failure.

YL-9 · 2024-03-09T11:11:20Z

We found that after 4.36, the default attention of llama is changed from "LlamaAttention" to "LlamaSdpaAttention". Hence the replacement function will not work. Instead, you may try:

modify_method_of_instance(base_model, "LlamaAttention", "forward", self_extend_forward) --> modify_method_of_instance(base_model, "LlamaSdpaAttention", "forward", self_extend_forward)

This might be the reason for the failure.

it work, thank you.
I have another question. I want to add it here. but it can still only run normally on 4.32, and the running result of 4.36 is still wrong.
I just added the following three pieces of code and used this command to run them: CUDA_VISIBLE_DEVICES=0,1 python eval/passkey.py --model /data/supry/models/llama-2/llama2-7b-hf --min-tokens 4096 --max-tokens 8192 --tokens-step 4096 --length-step 1024 --iterations 20 --serope
$MW9HYFAAUK~T{GVP AJT8S](https://github.com/datamllab/LongLM/assets/73892208/ac8c2bc8-b5f1-4215-b77c-dd37da0523e2) ![~FT7XB3NFIFLJHT%2KFFWSJ](https://github.com/datamllab/LongLM/assets/73892208/ce7832ec-db10-4f78-9dd3-ba04549495d6) ![VFY8E1_DN)BRU1~EL{~K@I$

YL-9 · 2024-03-09T11:14:46Z

$$QY)X~9HYTK{2}0 I4K4_W$

Mooler0410 · 2024-03-17T23:26:48Z

Hi YL-9! Could you please test whether self-extend can work by instance wise modification, like the example we provide? Sometimes, direct modification to the transformers' class does not take effect, while the cause of failure is case by case. That's the reason why we choose to modify the forward function of a model instance rather than its class. (Of course,, this can avoid any unexpected behavior for the modification only happens to the specific model instance)

YL-9 · 2024-03-18T07:32:22Z

ok, thank you!

ys-zong · 2024-04-01T19:34:46Z

Hi, thanks for the nice work! I see the current implementation in llama_self_extend_patch_4_36.py is regular pytorch. I wonder if you plan to implement Flash attention for transformers==4.36?

Mooler0410 · 2024-04-03T00:41:56Z

Hi, thanks for the nice work! I see the current implementation in llama_self_extend_patch_4_36.py is regular pytorch. I wonder if you plan to implement Flash attention for transformers==4.36?

Hi, thank you for your interests. The main different between transformers==4.36 and transformers==4.38.2 is how the RoPE is applied to KV. You may have a check. The computation of self attention is nearly the same. This means you can follow our 4.38.2 implementation to have a flash attention implementation for 4.36 with minor modification.

One of the possible issues is the flash_attn version used by 4.36. In that case, you may use our triton flash attention implementation instead of flash_attn. It's 10~20% slower than flash_attn.

Mooler0410 pinned this issue Mar 17, 2024

Mooler0410 closed this as completed Mar 22, 2024

Mooler0410 mentioned this issue Aug 1, 2024

LongLM isn't compatible with gemma-2-27b-it or gemma-2b-it #46

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

llama_self_extend_patch_4_36 is not work #23

llama_self_extend_patch_4_36 is not work #23

YL-9 commented Mar 8, 2024

Mooler0410 commented Mar 9, 2024 •

edited

Loading

YL-9 commented Mar 9, 2024

YL-9 commented Mar 9, 2024

Mooler0410 commented Mar 17, 2024

YL-9 commented Mar 18, 2024

ys-zong commented Apr 1, 2024

Mooler0410 commented Apr 3, 2024

llama_self_extend_patch_4_36 is not work #23

llama_self_extend_patch_4_36 is not work #23

Comments

YL-9 commented Mar 8, 2024

Mooler0410 commented Mar 9, 2024 • edited Loading

YL-9 commented Mar 9, 2024

YL-9 commented Mar 9, 2024

Mooler0410 commented Mar 17, 2024

YL-9 commented Mar 18, 2024

ys-zong commented Apr 1, 2024

Mooler0410 commented Apr 3, 2024

Mooler0410 commented Mar 9, 2024 •

edited

Loading