Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RuntimeError: shape '[1, 13, 8, 128]' is invalid for input of size 26624 #172

Open
zhuojun1024 opened this issue Aug 14, 2024 · 6 comments

Comments

@zhuojun1024
Copy link

I encountered an error while trying to run Llama3.1 405B according to the documentation. Can you help me identify the problem?

error message

(airllm) E:\Documents\VSCodeProjects\test-01>python main.py
>>>> bitsandbytes installed
>>>> cache_utils installed
found index file...
found_layers:{'model.embed_tokens.': True, 'model.layers.0.': True, 'model.layers.1.': True, 'model.layers.2.': True, 'model.layers.3.': True, 'model.layers.4.': True, 'model.layers.5.': True, 'model.layers.6.': True, 'model.layers.7.': True, 'model.layers.8.': True, 'model.layers.9.': True, 'model.layers.10.': True, 'model.layers.11.': True, 'model.layers.12.': True, 'model.layers.13.': True, 'model.layers.14.': True, 'model.layers.15.': True, 'model.layers.16.': True, 'model.layers.17.': True, 'model.layers.18.': True, 'model.layers.19.': True, 'model.layers.20.': True, 'model.layers.21.': True, 'model.layers.22.': True, 'model.layers.23.': True, 'model.layers.24.': True, 'model.layers.25.': True, 'model.layers.26.': True, 'model.layers.27.': True, 'model.layers.28.': True, 'model.layers.29.': True, 'model.layers.30.': True, 'model.layers.31.': True, 'model.layers.32.': True, 'model.layers.33.': True, 'model.layers.34.': True, 'model.layers.35.': True, 'model.layers.36.': True, 'model.layers.37.': True, 'model.layers.38.': True, 'model.layers.39.': True, 'model.layers.40.': True, 'model.layers.41.': True, 'model.layers.42.': True, 'model.layers.43.': True, 'model.layers.44.': True, 'model.layers.45.': True, 'model.layers.46.': True, 'model.layers.47.': True, 'model.layers.48.': True, 'model.layers.49.': True, 'model.layers.50.': True, 'model.layers.51.': True, 'model.layers.52.': True, 'model.layers.53.': True, 'model.layers.54.': True, 'model.layers.55.': True, 'model.layers.56.': True, 'model.layers.57.': True, 'model.layers.58.': True, 'model.layers.59.': True, 'model.layers.60.': True, 'model.layers.61.': True, 'model.layers.62.': True, 'model.layers.63.': True, 'model.layers.64.': True, 'model.layers.65.': True, 'model.layers.66.': True, 'model.layers.67.': True, 'model.layers.68.': True, 'model.layers.69.': True, 'model.layers.70.': True, 'model.layers.71.': True, 'model.layers.72.': True, 'model.layers.73.': True, 'model.layers.74.': True, 'model.layers.75.': True, 'model.layers.76.': True, 'model.layers.77.': True, 'model.layers.78.': True, 'model.layers.79.': True, 'model.layers.80.': True, 'model.layers.81.': True, 'model.layers.82.': True, 'model.layers.83.': True, 'model.layers.84.': True, 'model.layers.85.': True, 'model.layers.86.': True, 'model.layers.87.': True, 'model.layers.88.': True, 'model.layers.89.': True, 'model.layers.90.': True, 'model.layers.91.': True, 'model.layers.92.': True, 'model.layers.93.': True, 'model.layers.94.': True, 'model.layers.95.': True, 'model.layers.96.': True, 'model.layers.97.': True, 'model.layers.98.': True, 'model.layers.99.': True, 'model.layers.100.': True, 'model.layers.101.': True, 'model.layers.102.': True, 'model.layers.103.': True, 'model.layers.104.': True, 'model.layers.105.': True, 'model.layers.106.': True, 'model.layers.107.': True, 'model.layers.108.': True, 'model.layers.109.': True, 'model.layers.110.': True, 'model.layers.111.': True, 'model.layers.112.': True, 'model.layers.113.': True, 'model.layers.114.': True, 'model.layers.115.': True, 'model.layers.116.': True, 'model.layers.117.': True, 'model.layers.118.': True, 'model.layers.119.': True, 'model.layers.120.': True, 'model.layers.121.': True, 'model.layers.122.': True, 'model.layers.123.': True, 'model.layers.124.': True, 'model.layers.125.': True, 'model.norm.': True, 'lm_head.': True}
saved layers already found in C:\Models\airllm\llama3.1\405B-Instruct-bnb-4bit\splitted_model
new version of transfomer, no need to use BetterTransformer, try setting attn impl to sdpa...
attn imp: <class 'transformers.models.llama.modeling_llama.LlamaSdpaAttention'>
Unused kwargs: ['_load_in_4bit', '_load_in_8bit', 'quant_method']. These kwargs are not used in <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>.
new version of transfomer, no need to use BetterTransformer, try setting attn impl to sdpa...
attn imp: <class 'transformers.models.llama.modeling_llama.LlamaSdpaAttention'>
Unused kwargs: ['_load_in_4bit', '_load_in_8bit', 'quant_method']. These kwargs are not used in <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>.
running layers(cuda:0):   1%|▍                                                         | 1/129 [00:04<10:39,  5.00s/it]
Traceback (most recent call last):
  File "E:\Documents\VSCodeProjects\test-01\main.py", line 19, in <module>
    generation_output = model.generate(
  File "D:\MiniConda\envs\airllm\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "D:\MiniConda\envs\airllm\lib\site-packages\transformers\generation\utils.py", line 1989, in generate
    result = self._sample(
  File "D:\MiniConda\envs\airllm\lib\site-packages\transformers\generation\utils.py", line 2932, in _sample
    outputs = self(**model_inputs, return_dict=True)
  File "D:\MiniConda\envs\airllm\lib\site-packages\airllm\airllm_base.py", line 364, in __call__
    return self.forward(*args, **kwargs)
  File "D:\MiniConda\envs\airllm\lib\site-packages\airllm\airllm_base.py", line 564, in forward
    new_seq = layer(seq, **kwargs)[0]
  File "D:\MiniConda\envs\airllm\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "D:\MiniConda\envs\airllm\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\MiniConda\envs\airllm\lib\site-packages\transformers\models\llama\modeling_llama.py", line 677, in forward
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
  File "D:\MiniConda\envs\airllm\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "D:\MiniConda\envs\airllm\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\MiniConda\envs\airllm\lib\site-packages\transformers\models\llama\modeling_llama.py", line 565, in forward
    key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(1, 2)
RuntimeError: shape '[1, 13, 8, 128]' is invalid for input of size 26624

(airllm) E:\Documents\VSCodeProjects\test-01>

code

from airllm import AutoModel

MAX_LENGTH = 128
# could use hugging face model repo id:
model = AutoModel.from_pretrained(r"E:\unsloth\Meta-Llama-3.1-405B-Instruct-bnb-4bit",
    layer_shards_saving_path=r"C:\Models\airllm\llama3.1\405B-Instruct-bnb-4bit")

input_text = [
    'hello, can you provide a detailed self introduction in Chinese?',
  ]

input_tokens = model.tokenizer(input_text,
    return_tensors="pt", 
    return_attention_mask=False, 
    truncation=True, 
    max_length=MAX_LENGTH, 
    padding=True)

generation_output = model.generate(
    input_tokens['input_ids'].cuda(), 
    max_new_tokens=10,
    use_cache=True,
    return_dict_in_generate=True)

output = model.tokenizer.decode(generation_output.sequences[0])

print(output)

env

windows 11
python 3.10 env in minicanda
cuda 12.1

pip list

Package            Version
------------------ -----------
accelerate         0.33.0
aiohappyeyeballs   2.3.5
aiohttp            3.10.3
aiosignal          1.3.1
airllm             2.9.1
async-timeout      4.0.3
attrs              24.2.0
bitsandbytes       0.43.3
Brotli             1.0.9
certifi            2024.7.4
charset-normalizer 3.3.2
colorama           0.4.6
coloredlogs        15.0.1
datasets           2.21.0
dill               0.3.8
filelock           3.13.1
frozenlist         1.4.1
fsspec             2024.6.1
gmpy2              2.1.2
huggingface-hub    0.24.5
humanfriendly      10.0
idna               3.7
intel-openmp       2021.4.0
Jinja2             3.1.4
MarkupSafe         2.1.3
mkl                2021.4.0
mkl-fft            1.3.1
mkl-random         1.2.2
mkl-service        2.4.0
mpmath             1.3.0
multidict          6.0.5
multiprocess       0.70.16
networkx           3.3
numpy              1.24.3
optimum            1.21.3
packaging          24.1
pandas             2.2.2
pillow             10.4.0
pip                24.2
protobuf           5.27.3
psutil             6.0.0
pyarrow            17.0.0
pyreadline3        3.4.1
PySocks            1.7.1
python-dateutil    2.9.0.post0
pytz               2024.1
PyYAML             6.0.1
regex              2024.7.24
requests           2.32.3
safetensors        0.4.4
scipy              1.14.0
sentencepiece      0.2.0
setuptools         72.1.0
six                1.16.0
sympy              1.12
tbb                2021.13.1
tokenizers         0.19.1
torch              2.3.1
torchaudio         2.3.1
torchvision        0.18.1
tqdm               4.66.5
transformers       4.43.3
typing_extensions  4.11.0
tzdata             2024.1
urllib3            2.2.2
wheel              0.43.0
win-inet-pton      1.1.0
xxhash             3.4.1
yarl               1.9.4
@sgjohnson1981
Copy link

I'm getting something similar.
Windows 10
Conda python version: 3.12.4
pip list:

accelerate               0.33.0
aiohappyeyeballs         2.3.5
aiohttp                  3.10.3
aiosignal                1.3.1
airllm                   2.9.1
attrs                    24.2.0
bitsandbytes             0.43.3
certifi                  2024.7.4
charset-normalizer       3.3.2
coloredlogs              15.0.1
datasets                 2.21.0
dill                     0.3.8
filelock                 3.15.4
frozenlist               1.4.1
fsspec                   2024.6.1
huggingface-hub          0.24.5
humanfriendly            10.0
idna                     3.7
inquirerpy               0.3.4
Jinja2                   3.1.4
MarkupSafe               2.1.5
mpmath                   1.3.0
multidict                6.0.5
multiprocess             0.70.16
networkx                 3.3
numpy                    1.26.4
nvidia-cublas-cu12       12.1.3.1
nvidia-cuda-cupti-cu12   12.1.105
nvidia-cuda-nvrtc-cu12   12.1.105
nvidia-cuda-runtime-cu12 12.1.105
nvidia-cudnn-cu12        9.1.0.70
nvidia-cufft-cu12        11.0.2.54
nvidia-curand-cu12       10.3.2.106
nvidia-cusolver-cu12     11.4.5.107
nvidia-cusparse-cu12     12.1.0.106
nvidia-nccl-cu12         2.20.5
nvidia-nvjitlink-cu12    12.6.20
nvidia-nvtx-cu12         12.1.105
optimum                  1.21.3
packaging                24.1
pandas                   2.2.2
pfzy                     0.3.4
pip                      24.2
prompt_toolkit           3.0.47
protobuf                 5.27.3
psutil                   6.0.0
pyarrow                  17.0.0
python-dateutil          2.9.0.post0
pytz                     2024.1
PyYAML                   6.0.2
regex                    2024.7.24
requests                 2.32.3
safetensors              0.4.4
scipy                    1.14.0
sentencepiece            0.2.0
setuptools               72.1.0
six                      1.16.0
sympy                    1.13.2
tokenizers               0.19.1
torch                    2.4.0
tqdm                     4.66.5
transformers             4.43.4
triton                   3.0.0
typing_extensions        4.12.2
tzdata                   2024.1
urllib3                  2.2.2
wcwidth                  0.2.13
wheel                    0.43.0
xxhash                   3.4.1
yarl                     1.9.4
>>>> bitsandbytes installed
>>>> cache_utils installed
Fetching 8 files: 100%|█████████████████████████████████████████████████████████████████████████████████| 8/8 [00:00<00:00, 107892.06it/s]
found_layers:{'model.embed_tokens.': True, 'model.layers.0.': True, 'model.layers.1.': True, 'model.layers.2.': True, 'model.layers.3.': True, 'model.layers.4.': True, 'model.layers.5.': True, 'model.layers.6.': True, 'model.layers.7.': True, 'model.layers.8.': True, 'model.layers.9.': True, 'model.layers.10.': True, 'model.layers.11.': True, 'model.layers.12.': True, 'model.layers.13.': True, 'model.layers.14.': True, 'model.layers.15.': True, 'model.layers.16.': True, 'model.layers.17.': True, 'model.layers.18.': True, 'model.layers.19.': True, 'model.layers.20.': True, 'model.layers.21.': True, 'model.layers.22.': True, 'model.layers.23.': True, 'model.layers.24.': True, 'model.layers.25.': True, 'model.layers.26.': True, 'model.layers.27.': True, 'model.layers.28.': True, 'model.layers.29.': True, 'model.layers.30.': True, 'model.layers.31.': True, 'model.layers.32.': True, 'model.layers.33.': True, 'model.layers.34.': True, 'model.layers.35.': True, 'model.layers.36.': True, 'model.layers.37.': True, 'model.layers.38.': True, 'model.layers.39.': True, 'model.layers.40.': True, 'model.layers.41.': True, 'model.layers.42.': True, 'model.layers.43.': True, 'model.layers.44.': True, 'model.layers.45.': True, 'model.layers.46.': True, 'model.layers.47.': True, 'model.layers.48.': True, 'model.layers.49.': True, 'model.layers.50.': True, 'model.layers.51.': True, 'model.layers.52.': True, 'model.layers.53.': True, 'model.layers.54.': True, 'model.layers.55.': True, 'model.layers.56.': True, 'model.layers.57.': True, 'model.layers.58.': True, 'model.layers.59.': True, 'model.layers.60.': True, 'model.layers.61.': True, 'model.layers.62.': True, 'model.layers.63.': True, 'model.layers.64.': True, 'model.layers.65.': True, 'model.layers.66.': True, 'model.layers.67.': True, 'model.layers.68.': True, 'model.layers.69.': True, 'model.layers.70.': True, 'model.layers.71.': True, 'model.layers.72.': True, 'model.layers.73.': True, 'model.layers.74.': True, 'model.layers.75.': True, 'model.layers.76.': True, 'model.layers.77.': True, 'model.layers.78.': True, 'model.layers.79.': True, 'model.layers.80.': True, 'model.layers.81.': True, 'model.layers.82.': True, 'model.layers.83.': True, 'model.layers.84.': True, 'model.layers.85.': True, 'model.layers.86.': True, 'model.layers.87.': True, 'model.layers.88.': True, 'model.layers.89.': True, 'model.layers.90.': True, 'model.layers.91.': True, 'model.layers.92.': True, 'model.layers.93.': True, 'model.layers.94.': True, 'model.layers.95.': True, 'model.layers.96.': True, 'model.layers.97.': True, 'model.layers.98.': True, 'model.layers.99.': True, 'model.layers.100.': True, 'model.layers.101.': True, 'model.layers.102.': True, 'model.layers.103.': True, 'model.layers.104.': True, 'model.layers.105.': True, 'model.layers.106.': True, 'model.layers.107.': True, 'model.layers.108.': True, 'model.layers.109.': True, 'model.layers.110.': True, 'model.layers.111.': True, 'model.layers.112.': True, 'model.layers.113.': True, 'model.layers.114.': True, 'model.layers.115.': True, 'model.layers.116.': True, 'model.layers.117.': True, 'model.layers.118.': True, 'model.layers.119.': True, 'model.layers.120.': True, 'model.layers.121.': True, 'model.layers.122.': True, 'model.layers.123.': True, 'model.layers.124.': True, 'model.layers.125.': True, 'model.norm.': True, 'lm_head.': True}
saved layers already found in /media/user/***/AI/huggingface-cache/hub/models--unsloth--Meta-Llama-3.1-405B-Instruct-bnb-4bit/snapshots/75329c90a47d4b9f2a5455d6ab43612ddf72a77e/splitted_model
new version of transfomer, no need to use BetterTransformer, try setting attn impl to sdpa...
attn imp: <class 'transformers.models.llama.modeling_llama.LlamaSdpaAttention'>
Unused kwargs: ['_load_in_4bit', '_load_in_8bit', 'quant_method']. These kwargs are not used in <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>.
new version of transfomer, no need to use BetterTransformer, try setting attn impl to sdpa...
attn imp: <class 'transformers.models.llama.modeling_llama.LlamaSdpaAttention'>
Unused kwargs: ['_load_in_4bit', '_load_in_8bit', 'quant_method']. These kwargs are not used in <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>.
running layers(cuda:0):   1%|▌                                                                          | 1/129 [00:43<1:33:27, 43.81s/it]
Traceback (most recent call last):
  File "/media/user/***/code/LLMs/airllm/demo.py", line 14, in <module>
    generation_output = model.generate(
                        ^^^^^^^^^^^^^^^
  File "/home/user/miniconda3/envs/airllm/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/miniconda3/envs/airllm/lib/python3.12/site-packages/transformers/generation/utils.py", line 1989, in generate
    result = self._sample(
             ^^^^^^^^^^^^^
  File "/home/user/miniconda3/envs/airllm/lib/python3.12/site-packages/transformers/generation/utils.py", line 2932, in _sample
    outputs = self(**model_inputs, return_dict=True)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/miniconda3/envs/airllm/lib/python3.12/site-packages/airllm/airllm_base.py", line 364, in __call__
    return self.forward(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/miniconda3/envs/airllm/lib/python3.12/site-packages/airllm/airllm_base.py", line 564, in forward
    new_seq = layer(seq, **kwargs)[0]
              ^^^^^^^^^^^^^^^^^^^^
  File "/home/user/miniconda3/envs/airllm/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/miniconda3/envs/airllm/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/miniconda3/envs/airllm/lib/python3.12/site-packages/transformers/models/llama/modeling_llama.py", line 677, in forward
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
                                                          ^^^^^^^^^^^^^^^
  File "/home/user/miniconda3/envs/airllm/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/miniconda3/envs/airllm/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/miniconda3/envs/airllm/lib/python3.12/site-packages/transformers/models/llama/modeling_llama.py", line 565, in forward
    key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(1, 2)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: shape '[1, 9, 8, 128]' is invalid for input of size 18432

@JuergenMutschall
Copy link

JuergenMutschall commented Aug 20, 2024

same error here, exact same stacktrace on
Windows 11,
WSL2, Ubuntu,
pip install with workaround installs of bitsandbytes

3.': True, 'model.layers.114.': True, 'model.layers.115.': True, 'model.layers.116.': True, 'model.layers.117.': True, 'model.layers.118.': True, 'model.layers.119.': True, 'model.layers.120.': True, 'model.layers.121.': True, 'model.layers.122.': True, 'model.layers.123.': True, 'model.layers.124.': True, 'model.layers.125.': True, 'model.norm.': True, 'lm_head.': True} saved layers already found in /home/juergen/.cache/huggingface/hub/models--unsloth--Meta-Llama-3.1-405B-Instruct-bnb-4bit/snapshots/75329c90a47d4b9f2a5455d6ab43612ddf72a77e/splitted_model new version of transfomer, no need to use BetterTransformer, try setting attn impl to sdpa... attn imp: <class 'transformers.models.llama.modeling_llama.LlamaSdpaAttention'> Unused kwargs: ['_load_in_4bit', '_load_in_8bit', 'quant_method']. These kwargs are not used in <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>. new version of transfomer, no need to use BetterTransformer, try setting attn impl to sdpa... attn imp: <class 'transformers.models.llama.modeling_llama.LlamaSdpaAttention'> Unused kwargs: ['_load_in_4bit', '_load_in_8bit', 'quant_method']. These kwargs are not used in <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>. running layers(cuda:0): 1%|▍ | 1/129 [00:05<12:16, 5.76s/it] Traceback (most recent call last): File "/home/juergen/airllm/test_llama405b.py", line 15, in <module> generation_output = model.generate( ^^^^^^^^^^^^^^^ File "/home/juergen/.local/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/home/juergen/.local/lib/python3.12/site-packages/transformers/generation/utils.py", line 1989, in generate result = self._sample( ^^^^^^^^^^^^^ File "/home/juergen/.local/lib/python3.12/site-packages/transformers/generation/utils.py", line 2932, in _sample outputs = self(**model_inputs, return_dict=True) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/juergen/.local/lib/python3.12/site-packages/airllm/airllm_base.py", line 369, in __call__ return self.forward(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/juergen/.local/lib/python3.12/site-packages/airllm/airllm_base.py", line 569, in forward new_seq = layer(seq, **kwargs)[0] ^^^^^^^^^^^^^^^^^^^^ File "/home/juergen/.local/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/juergen/.local/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/juergen/.local/lib/python3.12/site-packages/transformers/models/llama/modeling_llama.py", line 677, in forward hidden_states, self_attn_weights, present_key_value = self.self_attn( ^^^^^^^^^^^^^^^ File "/home/juergen/.local/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/juergen/.local/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/juergen/.local/lib/python3.12/site-packages/transformers/models/llama/modeling_llama.py", line 565, in forward key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(1, 2) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ RuntimeError: shape '[1, 9, 8, 128]' is invalid for input of size 18432

@404835993
Copy link

404835993 commented Aug 22, 2024

Same,any body could find the reason?

@1272870698
Copy link

me
too

同样的错误,在Windows 11、 WSL2、Ubuntu、 pip 安装以及 bitsandbytes 的解决方法安装上,堆栈跟踪完全相同

3.': True, 'model.layers.114.': True, 'model.layers.115.': True, 'model.layers.116.': True, 'model.layers.117.': True, 'model.layers.118.': True, 'model.layers.119.': True, 'model.layers.120.': True, 'model.layers.121.': True, 'model.layers.122.': True, 'model.layers.123.': True, 'model.layers.124.': True, 'model.layers.125.': True, 'model.norm.': True, 'lm_head.': True} saved layers already found in /home/juergen/.cache/huggingface/hub/models--unsloth--Meta-Llama-3.1-405B-Instruct-bnb-4bit/snapshots/75329c90a47d4b9f2a5455d6ab43612ddf72a77e/splitted_model new version of transfomer, no need to use BetterTransformer, try setting attn impl to sdpa... attn imp: <class 'transformers.models.llama.modeling_llama.LlamaSdpaAttention'> Unused kwargs: ['_load_in_4bit', '_load_in_8bit', 'quant_method']. These kwargs are not used in <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>. new version of transfomer, no need to use BetterTransformer, try setting attn impl to sdpa... attn imp: <class 'transformers.models.llama.modeling_llama.LlamaSdpaAttention'> Unused kwargs: ['_load_in_4bit', '_load_in_8bit', 'quant_method']. These kwargs are not used in <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>. running layers(cuda:0): 1%|▍ | 1/129 [00:05<12:16, 5.76s/it] Traceback (most recent call last): File "/home/juergen/airllm/test_llama405b.py", line 15, in <module> generation_output = model.generate( ^^^^^^^^^^^^^^^ File "/home/juergen/.local/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/home/juergen/.local/lib/python3.12/site-packages/transformers/generation/utils.py", line 1989, in generate result = self._sample( ^^^^^^^^^^^^^ File "/home/juergen/.local/lib/python3.12/site-packages/transformers/generation/utils.py", line 2932, in _sample outputs = self(**model_inputs, return_dict=True) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/juergen/.local/lib/python3.12/site-packages/airllm/airllm_base.py", line 369, in __call__ return self.forward(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/juergen/.local/lib/python3.12/site-packages/airllm/airllm_base.py", line 569, in forward new_seq = layer(seq, **kwargs)[0] ^^^^^^^^^^^^^^^^^^^^ File "/home/juergen/.local/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/juergen/.local/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/juergen/.local/lib/python3.12/site-packages/transformers/models/llama/modeling_llama.py", line 677, in forward hidden_states, self_attn_weights, present_key_value = self.self_attn( ^^^^^^^^^^^^^^^ File "/home/juergen/.local/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/juergen/.local/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/juergen/.local/lib/python3.12/site-packages/transformers/models/llama/modeling_llama.py", line 565, in forward key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(1, 2) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ RuntimeError: shape '[1, 9, 8, 128]' is invalid for input of size 18432

me too ,Have you solved it

@404835993
Copy link

Did any one solved this?

@beleon
Copy link

beleon commented Aug 31, 2024

EDIT: My test run finished without error and returned expected results.

So, I have a potential quick fix. My test hasn't gone through yet, but it has already been running for quite a while. I didn't dig into why exactly this is, but config.num_key_value_heads in .venv/lib/python3.12/site-packages/transformers/models/llama/modeling_llama.py is exactly half as large as it should be.

so in .venv/lib/python3.12/site-packages/transformers/models/llama/modeling_llama.py (for me its on line 288) replace

        self.num_key_value_heads = config.num_key_value_heads

with

        self.num_key_value_heads = config.num_key_value_heads * 2

This seems to make it work for this specific airllm setup for me. However it might mess up some other use case as I suspect the error originates elsewhere and the wrong value is just passed to this file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants