RuntimeError: shape '[1, 13, 8, 128]' is invalid for input of size 26624 #172

zhuojun1024 · 2024-08-14T18:47:51Z

I encountered an error while trying to run Llama3.1 405B according to the documentation. Can you help me identify the problem?

error message

(airllm) E:\Documents\VSCodeProjects\test-01>python main.py
>>>> bitsandbytes installed
>>>> cache_utils installed
found index file...
found_layers:{'model.embed_tokens.': True, 'model.layers.0.': True, 'model.layers.1.': True, 'model.layers.2.': True, 'model.layers.3.': True, 'model.layers.4.': True, 'model.layers.5.': True, 'model.layers.6.': True, 'model.layers.7.': True, 'model.layers.8.': True, 'model.layers.9.': True, 'model.layers.10.': True, 'model.layers.11.': True, 'model.layers.12.': True, 'model.layers.13.': True, 'model.layers.14.': True, 'model.layers.15.': True, 'model.layers.16.': True, 'model.layers.17.': True, 'model.layers.18.': True, 'model.layers.19.': True, 'model.layers.20.': True, 'model.layers.21.': True, 'model.layers.22.': True, 'model.layers.23.': True, 'model.layers.24.': True, 'model.layers.25.': True, 'model.layers.26.': True, 'model.layers.27.': True, 'model.layers.28.': True, 'model.layers.29.': True, 'model.layers.30.': True, 'model.layers.31.': True, 'model.layers.32.': True, 'model.layers.33.': True, 'model.layers.34.': True, 'model.layers.35.': True, 'model.layers.36.': True, 'model.layers.37.': True, 'model.layers.38.': True, 'model.layers.39.': True, 'model.layers.40.': True, 'model.layers.41.': True, 'model.layers.42.': True, 'model.layers.43.': True, 'model.layers.44.': True, 'model.layers.45.': True, 'model.layers.46.': True, 'model.layers.47.': True, 'model.layers.48.': True, 'model.layers.49.': True, 'model.layers.50.': True, 'model.layers.51.': True, 'model.layers.52.': True, 'model.layers.53.': True, 'model.layers.54.': True, 'model.layers.55.': True, 'model.layers.56.': True, 'model.layers.57.': True, 'model.layers.58.': True, 'model.layers.59.': True, 'model.layers.60.': True, 'model.layers.61.': True, 'model.layers.62.': True, 'model.layers.63.': True, 'model.layers.64.': True, 'model.layers.65.': True, 'model.layers.66.': True, 'model.layers.67.': True, 'model.layers.68.': True, 'model.layers.69.': True, 'model.layers.70.': True, 'model.layers.71.': True, 'model.layers.72.': True, 'model.layers.73.': True, 'model.layers.74.': True, 'model.layers.75.': True, 'model.layers.76.': True, 'model.layers.77.': True, 'model.layers.78.': True, 'model.layers.79.': True, 'model.layers.80.': True, 'model.layers.81.': True, 'model.layers.82.': True, 'model.layers.83.': True, 'model.layers.84.': True, 'model.layers.85.': True, 'model.layers.86.': True, 'model.layers.87.': True, 'model.layers.88.': True, 'model.layers.89.': True, 'model.layers.90.': True, 'model.layers.91.': True, 'model.layers.92.': True, 'model.layers.93.': True, 'model.layers.94.': True, 'model.layers.95.': True, 'model.layers.96.': True, 'model.layers.97.': True, 'model.layers.98.': True, 'model.layers.99.': True, 'model.layers.100.': True, 'model.layers.101.': True, 'model.layers.102.': True, 'model.layers.103.': True, 'model.layers.104.': True, 'model.layers.105.': True, 'model.layers.106.': True, 'model.layers.107.': True, 'model.layers.108.': True, 'model.layers.109.': True, 'model.layers.110.': True, 'model.layers.111.': True, 'model.layers.112.': True, 'model.layers.113.': True, 'model.layers.114.': True, 'model.layers.115.': True, 'model.layers.116.': True, 'model.layers.117.': True, 'model.layers.118.': True, 'model.layers.119.': True, 'model.layers.120.': True, 'model.layers.121.': True, 'model.layers.122.': True, 'model.layers.123.': True, 'model.layers.124.': True, 'model.layers.125.': True, 'model.norm.': True, 'lm_head.': True}
saved layers already found in C:\Models\airllm\llama3.1\405B-Instruct-bnb-4bit\splitted_model
new version of transfomer, no need to use BetterTransformer, try setting attn impl to sdpa...
attn imp: <class 'transformers.models.llama.modeling_llama.LlamaSdpaAttention'>
Unused kwargs: ['_load_in_4bit', '_load_in_8bit', 'quant_method']. These kwargs are not used in <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>.
new version of transfomer, no need to use BetterTransformer, try setting attn impl to sdpa...
attn imp: <class 'transformers.models.llama.modeling_llama.LlamaSdpaAttention'>
Unused kwargs: ['_load_in_4bit', '_load_in_8bit', 'quant_method']. These kwargs are not used in <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>.
running layers(cuda:0):   1%|▍                                                         | 1/129 [00:04<10:39,  5.00s/it]
Traceback (most recent call last):
  File "E:\Documents\VSCodeProjects\test-01\main.py", line 19, in <module>
    generation_output = model.generate(
  File "D:\MiniConda\envs\airllm\lib\site-packages\torch\utils\_contextlib.py", line 115, in decorate_context
    return func(*args, **kwargs)
  File "D:\MiniConda\envs\airllm\lib\site-packages\transformers\generation\utils.py", line 1989, in generate
    result = self._sample(
  File "D:\MiniConda\envs\airllm\lib\site-packages\transformers\generation\utils.py", line 2932, in _sample
    outputs = self(**model_inputs, return_dict=True)
  File "D:\MiniConda\envs\airllm\lib\site-packages\airllm\airllm_base.py", line 364, in __call__
    return self.forward(*args, **kwargs)
  File "D:\MiniConda\envs\airllm\lib\site-packages\airllm\airllm_base.py", line 564, in forward
    new_seq = layer(seq, **kwargs)[0]
  File "D:\MiniConda\envs\airllm\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "D:\MiniConda\envs\airllm\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\MiniConda\envs\airllm\lib\site-packages\transformers\models\llama\modeling_llama.py", line 677, in forward
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
  File "D:\MiniConda\envs\airllm\lib\site-packages\torch\nn\modules\module.py", line 1532, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "D:\MiniConda\envs\airllm\lib\site-packages\torch\nn\modules\module.py", line 1541, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\MiniConda\envs\airllm\lib\site-packages\transformers\models\llama\modeling_llama.py", line 565, in forward
    key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(1, 2)
RuntimeError: shape '[1, 13, 8, 128]' is invalid for input of size 26624

(airllm) E:\Documents\VSCodeProjects\test-01>

code

from airllm import AutoModel

MAX_LENGTH = 128
# could use hugging face model repo id:
model = AutoModel.from_pretrained(r"E:\unsloth\Meta-Llama-3.1-405B-Instruct-bnb-4bit",
    layer_shards_saving_path=r"C:\Models\airllm\llama3.1\405B-Instruct-bnb-4bit")

input_text = [
    'hello, can you provide a detailed self introduction in Chinese?',
  ]

input_tokens = model.tokenizer(input_text,
    return_tensors="pt", 
    return_attention_mask=False, 
    truncation=True, 
    max_length=MAX_LENGTH, 
    padding=True)

generation_output = model.generate(
    input_tokens['input_ids'].cuda(), 
    max_new_tokens=10,
    use_cache=True,
    return_dict_in_generate=True)

output = model.tokenizer.decode(generation_output.sequences[0])

print(output)

env

windows 11
python 3.10 env in minicanda
cuda 12.1

pip list

Package            Version
------------------ -----------
accelerate         0.33.0
aiohappyeyeballs   2.3.5
aiohttp            3.10.3
aiosignal          1.3.1
airllm             2.9.1
async-timeout      4.0.3
attrs              24.2.0
bitsandbytes       0.43.3
Brotli             1.0.9
certifi            2024.7.4
charset-normalizer 3.3.2
colorama           0.4.6
coloredlogs        15.0.1
datasets           2.21.0
dill               0.3.8
filelock           3.13.1
frozenlist         1.4.1
fsspec             2024.6.1
gmpy2              2.1.2
huggingface-hub    0.24.5
humanfriendly      10.0
idna               3.7
intel-openmp       2021.4.0
Jinja2             3.1.4
MarkupSafe         2.1.3
mkl                2021.4.0
mkl-fft            1.3.1
mkl-random         1.2.2
mkl-service        2.4.0
mpmath             1.3.0
multidict          6.0.5
multiprocess       0.70.16
networkx           3.3
numpy              1.24.3
optimum            1.21.3
packaging          24.1
pandas             2.2.2
pillow             10.4.0
pip                24.2
protobuf           5.27.3
psutil             6.0.0
pyarrow            17.0.0
pyreadline3        3.4.1
PySocks            1.7.1
python-dateutil    2.9.0.post0
pytz               2024.1
PyYAML             6.0.1
regex              2024.7.24
requests           2.32.3
safetensors        0.4.4
scipy              1.14.0
sentencepiece      0.2.0
setuptools         72.1.0
six                1.16.0
sympy              1.12
tbb                2021.13.1
tokenizers         0.19.1
torch              2.3.1
torchaudio         2.3.1
torchvision        0.18.1
tqdm               4.66.5
transformers       4.43.3
typing_extensions  4.11.0
tzdata             2024.1
urllib3            2.2.2
wheel              0.43.0
win-inet-pton      1.1.0
xxhash             3.4.1
yarl               1.9.4

The text was updated successfully, but these errors were encountered:

sgjohnson1981 · 2024-08-17T20:21:36Z

I'm getting something similar.
Windows 10
Conda python version: 3.12.4
pip list:

accelerate               0.33.0
aiohappyeyeballs         2.3.5
aiohttp                  3.10.3
aiosignal                1.3.1
airllm                   2.9.1
attrs                    24.2.0
bitsandbytes             0.43.3
certifi                  2024.7.4
charset-normalizer       3.3.2
coloredlogs              15.0.1
datasets                 2.21.0
dill                     0.3.8
filelock                 3.15.4
frozenlist               1.4.1
fsspec                   2024.6.1
huggingface-hub          0.24.5
humanfriendly            10.0
idna                     3.7
inquirerpy               0.3.4
Jinja2                   3.1.4
MarkupSafe               2.1.5
mpmath                   1.3.0
multidict                6.0.5
multiprocess             0.70.16
networkx                 3.3
numpy                    1.26.4
nvidia-cublas-cu12       12.1.3.1
nvidia-cuda-cupti-cu12   12.1.105
nvidia-cuda-nvrtc-cu12   12.1.105
nvidia-cuda-runtime-cu12 12.1.105
nvidia-cudnn-cu12        9.1.0.70
nvidia-cufft-cu12        11.0.2.54
nvidia-curand-cu12       10.3.2.106
nvidia-cusolver-cu12     11.4.5.107
nvidia-cusparse-cu12     12.1.0.106
nvidia-nccl-cu12         2.20.5
nvidia-nvjitlink-cu12    12.6.20
nvidia-nvtx-cu12         12.1.105
optimum                  1.21.3
packaging                24.1
pandas                   2.2.2
pfzy                     0.3.4
pip                      24.2
prompt_toolkit           3.0.47
protobuf                 5.27.3
psutil                   6.0.0
pyarrow                  17.0.0
python-dateutil          2.9.0.post0
pytz                     2024.1
PyYAML                   6.0.2
regex                    2024.7.24
requests                 2.32.3
safetensors              0.4.4
scipy                    1.14.0
sentencepiece            0.2.0
setuptools               72.1.0
six                      1.16.0
sympy                    1.13.2
tokenizers               0.19.1
torch                    2.4.0
tqdm                     4.66.5
transformers             4.43.4
triton                   3.0.0
typing_extensions        4.12.2
tzdata                   2024.1
urllib3                  2.2.2
wcwidth                  0.2.13
wheel                    0.43.0
xxhash                   3.4.1
yarl                     1.9.4

>>>> bitsandbytes installed
>>>> cache_utils installed
Fetching 8 files: 100%|█████████████████████████████████████████████████████████████████████████████████| 8/8 [00:00<00:00, 107892.06it/s]
found_layers:{'model.embed_tokens.': True, 'model.layers.0.': True, 'model.layers.1.': True, 'model.layers.2.': True, 'model.layers.3.': True, 'model.layers.4.': True, 'model.layers.5.': True, 'model.layers.6.': True, 'model.layers.7.': True, 'model.layers.8.': True, 'model.layers.9.': True, 'model.layers.10.': True, 'model.layers.11.': True, 'model.layers.12.': True, 'model.layers.13.': True, 'model.layers.14.': True, 'model.layers.15.': True, 'model.layers.16.': True, 'model.layers.17.': True, 'model.layers.18.': True, 'model.layers.19.': True, 'model.layers.20.': True, 'model.layers.21.': True, 'model.layers.22.': True, 'model.layers.23.': True, 'model.layers.24.': True, 'model.layers.25.': True, 'model.layers.26.': True, 'model.layers.27.': True, 'model.layers.28.': True, 'model.layers.29.': True, 'model.layers.30.': True, 'model.layers.31.': True, 'model.layers.32.': True, 'model.layers.33.': True, 'model.layers.34.': True, 'model.layers.35.': True, 'model.layers.36.': True, 'model.layers.37.': True, 'model.layers.38.': True, 'model.layers.39.': True, 'model.layers.40.': True, 'model.layers.41.': True, 'model.layers.42.': True, 'model.layers.43.': True, 'model.layers.44.': True, 'model.layers.45.': True, 'model.layers.46.': True, 'model.layers.47.': True, 'model.layers.48.': True, 'model.layers.49.': True, 'model.layers.50.': True, 'model.layers.51.': True, 'model.layers.52.': True, 'model.layers.53.': True, 'model.layers.54.': True, 'model.layers.55.': True, 'model.layers.56.': True, 'model.layers.57.': True, 'model.layers.58.': True, 'model.layers.59.': True, 'model.layers.60.': True, 'model.layers.61.': True, 'model.layers.62.': True, 'model.layers.63.': True, 'model.layers.64.': True, 'model.layers.65.': True, 'model.layers.66.': True, 'model.layers.67.': True, 'model.layers.68.': True, 'model.layers.69.': True, 'model.layers.70.': True, 'model.layers.71.': True, 'model.layers.72.': True, 'model.layers.73.': True, 'model.layers.74.': True, 'model.layers.75.': True, 'model.layers.76.': True, 'model.layers.77.': True, 'model.layers.78.': True, 'model.layers.79.': True, 'model.layers.80.': True, 'model.layers.81.': True, 'model.layers.82.': True, 'model.layers.83.': True, 'model.layers.84.': True, 'model.layers.85.': True, 'model.layers.86.': True, 'model.layers.87.': True, 'model.layers.88.': True, 'model.layers.89.': True, 'model.layers.90.': True, 'model.layers.91.': True, 'model.layers.92.': True, 'model.layers.93.': True, 'model.layers.94.': True, 'model.layers.95.': True, 'model.layers.96.': True, 'model.layers.97.': True, 'model.layers.98.': True, 'model.layers.99.': True, 'model.layers.100.': True, 'model.layers.101.': True, 'model.layers.102.': True, 'model.layers.103.': True, 'model.layers.104.': True, 'model.layers.105.': True, 'model.layers.106.': True, 'model.layers.107.': True, 'model.layers.108.': True, 'model.layers.109.': True, 'model.layers.110.': True, 'model.layers.111.': True, 'model.layers.112.': True, 'model.layers.113.': True, 'model.layers.114.': True, 'model.layers.115.': True, 'model.layers.116.': True, 'model.layers.117.': True, 'model.layers.118.': True, 'model.layers.119.': True, 'model.layers.120.': True, 'model.layers.121.': True, 'model.layers.122.': True, 'model.layers.123.': True, 'model.layers.124.': True, 'model.layers.125.': True, 'model.norm.': True, 'lm_head.': True}
saved layers already found in /media/user/***/AI/huggingface-cache/hub/models--unsloth--Meta-Llama-3.1-405B-Instruct-bnb-4bit/snapshots/75329c90a47d4b9f2a5455d6ab43612ddf72a77e/splitted_model
new version of transfomer, no need to use BetterTransformer, try setting attn impl to sdpa...
attn imp: <class 'transformers.models.llama.modeling_llama.LlamaSdpaAttention'>
Unused kwargs: ['_load_in_4bit', '_load_in_8bit', 'quant_method']. These kwargs are not used in <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>.
new version of transfomer, no need to use BetterTransformer, try setting attn impl to sdpa...
attn imp: <class 'transformers.models.llama.modeling_llama.LlamaSdpaAttention'>
Unused kwargs: ['_load_in_4bit', '_load_in_8bit', 'quant_method']. These kwargs are not used in <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>.
running layers(cuda:0):   1%|▌                                                                          | 1/129 [00:43<1:33:27, 43.81s/it]
Traceback (most recent call last):
  File "/media/user/***/code/LLMs/airllm/demo.py", line 14, in <module>
    generation_output = model.generate(
                        ^^^^^^^^^^^^^^^
  File "/home/user/miniconda3/envs/airllm/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context
    return func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/miniconda3/envs/airllm/lib/python3.12/site-packages/transformers/generation/utils.py", line 1989, in generate
    result = self._sample(
             ^^^^^^^^^^^^^
  File "/home/user/miniconda3/envs/airllm/lib/python3.12/site-packages/transformers/generation/utils.py", line 2932, in _sample
    outputs = self(**model_inputs, return_dict=True)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/miniconda3/envs/airllm/lib/python3.12/site-packages/airllm/airllm_base.py", line 364, in __call__
    return self.forward(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/miniconda3/envs/airllm/lib/python3.12/site-packages/airllm/airllm_base.py", line 564, in forward
    new_seq = layer(seq, **kwargs)[0]
              ^^^^^^^^^^^^^^^^^^^^
  File "/home/user/miniconda3/envs/airllm/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/miniconda3/envs/airllm/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/miniconda3/envs/airllm/lib/python3.12/site-packages/transformers/models/llama/modeling_llama.py", line 677, in forward
    hidden_states, self_attn_weights, present_key_value = self.self_attn(
                                                          ^^^^^^^^^^^^^^^
  File "/home/user/miniconda3/envs/airllm/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/miniconda3/envs/airllm/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl
    return forward_call(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/user/miniconda3/envs/airllm/lib/python3.12/site-packages/transformers/models/llama/modeling_llama.py", line 565, in forward
    key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(1, 2)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
RuntimeError: shape '[1, 9, 8, 128]' is invalid for input of size 18432

JuergenMutschall · 2024-08-20T08:25:21Z

same error here, exact same stacktrace on
Windows 11,
WSL2, Ubuntu,
pip install with workaround installs of bitsandbytes

3.': True, 'model.layers.114.': True, 'model.layers.115.': True, 'model.layers.116.': True, 'model.layers.117.': True, 'model.layers.118.': True, 'model.layers.119.': True, 'model.layers.120.': True, 'model.layers.121.': True, 'model.layers.122.': True, 'model.layers.123.': True, 'model.layers.124.': True, 'model.layers.125.': True, 'model.norm.': True, 'lm_head.': True} saved layers already found in /home/juergen/.cache/huggingface/hub/models--unsloth--Meta-Llama-3.1-405B-Instruct-bnb-4bit/snapshots/75329c90a47d4b9f2a5455d6ab43612ddf72a77e/splitted_model new version of transfomer, no need to use BetterTransformer, try setting attn impl to sdpa... attn imp: <class 'transformers.models.llama.modeling_llama.LlamaSdpaAttention'> Unused kwargs: ['_load_in_4bit', '_load_in_8bit', 'quant_method']. These kwargs are not used in <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>. new version of transfomer, no need to use BetterTransformer, try setting attn impl to sdpa... attn imp: <class 'transformers.models.llama.modeling_llama.LlamaSdpaAttention'> Unused kwargs: ['_load_in_4bit', '_load_in_8bit', 'quant_method']. These kwargs are not used in <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>. running layers(cuda:0): 1%|▍ | 1/129 [00:05<12:16, 5.76s/it] Traceback (most recent call last): File "/home/juergen/airllm/test_llama405b.py", line 15, in <module> generation_output = model.generate( ^^^^^^^^^^^^^^^ File "/home/juergen/.local/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/home/juergen/.local/lib/python3.12/site-packages/transformers/generation/utils.py", line 1989, in generate result = self._sample( ^^^^^^^^^^^^^ File "/home/juergen/.local/lib/python3.12/site-packages/transformers/generation/utils.py", line 2932, in _sample outputs = self(**model_inputs, return_dict=True) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/juergen/.local/lib/python3.12/site-packages/airllm/airllm_base.py", line 369, in __call__ return self.forward(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/juergen/.local/lib/python3.12/site-packages/airllm/airllm_base.py", line 569, in forward new_seq = layer(seq, **kwargs)[0] ^^^^^^^^^^^^^^^^^^^^ File "/home/juergen/.local/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/juergen/.local/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/juergen/.local/lib/python3.12/site-packages/transformers/models/llama/modeling_llama.py", line 677, in forward hidden_states, self_attn_weights, present_key_value = self.self_attn( ^^^^^^^^^^^^^^^ File "/home/juergen/.local/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/juergen/.local/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/juergen/.local/lib/python3.12/site-packages/transformers/models/llama/modeling_llama.py", line 565, in forward key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(1, 2) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ RuntimeError: shape '[1, 9, 8, 128]' is invalid for input of size 18432

404835993 · 2024-08-22T04:01:19Z

Same，any body could find the reason?

1272870698 · 2024-08-23T02:03:58Z

me
too

同样的错误，在Windows 11、 WSL2、Ubuntu、 pip 安装以及 bitsandbytes 的解决方法安装上，堆栈跟踪完全相同

3.': True, 'model.layers.114.': True, 'model.layers.115.': True, 'model.layers.116.': True, 'model.layers.117.': True, 'model.layers.118.': True, 'model.layers.119.': True, 'model.layers.120.': True, 'model.layers.121.': True, 'model.layers.122.': True, 'model.layers.123.': True, 'model.layers.124.': True, 'model.layers.125.': True, 'model.norm.': True, 'lm_head.': True} saved layers already found in /home/juergen/.cache/huggingface/hub/models--unsloth--Meta-Llama-3.1-405B-Instruct-bnb-4bit/snapshots/75329c90a47d4b9f2a5455d6ab43612ddf72a77e/splitted_model new version of transfomer, no need to use BetterTransformer, try setting attn impl to sdpa... attn imp: <class 'transformers.models.llama.modeling_llama.LlamaSdpaAttention'> Unused kwargs: ['_load_in_4bit', '_load_in_8bit', 'quant_method']. These kwargs are not used in <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>. new version of transfomer, no need to use BetterTransformer, try setting attn impl to sdpa... attn imp: <class 'transformers.models.llama.modeling_llama.LlamaSdpaAttention'> Unused kwargs: ['_load_in_4bit', '_load_in_8bit', 'quant_method']. These kwargs are not used in <class 'transformers.utils.quantization_config.BitsAndBytesConfig'>. running layers(cuda:0): 1%|▍ | 1/129 [00:05<12:16, 5.76s/it] Traceback (most recent call last): File "/home/juergen/airllm/test_llama405b.py", line 15, in <module> generation_output = model.generate( ^^^^^^^^^^^^^^^ File "/home/juergen/.local/lib/python3.12/site-packages/torch/utils/_contextlib.py", line 116, in decorate_context return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/home/juergen/.local/lib/python3.12/site-packages/transformers/generation/utils.py", line 1989, in generate result = self._sample( ^^^^^^^^^^^^^ File "/home/juergen/.local/lib/python3.12/site-packages/transformers/generation/utils.py", line 2932, in _sample outputs = self(**model_inputs, return_dict=True) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/juergen/.local/lib/python3.12/site-packages/airllm/airllm_base.py", line 369, in __call__ return self.forward(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/juergen/.local/lib/python3.12/site-packages/airllm/airllm_base.py", line 569, in forward new_seq = layer(seq, **kwargs)[0] ^^^^^^^^^^^^^^^^^^^^ File "/home/juergen/.local/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/juergen/.local/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/juergen/.local/lib/python3.12/site-packages/transformers/models/llama/modeling_llama.py", line 677, in forward hidden_states, self_attn_weights, present_key_value = self.self_attn( ^^^^^^^^^^^^^^^ File "/home/juergen/.local/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1553, in _wrapped_call_impl return self._call_impl(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/juergen/.local/lib/python3.12/site-packages/torch/nn/modules/module.py", line 1562, in _call_impl return forward_call(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/home/juergen/.local/lib/python3.12/site-packages/transformers/models/llama/modeling_llama.py", line 565, in forward key_states = key_states.view(bsz, q_len, self.num_key_value_heads, self.head_dim).transpose(1, 2) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ RuntimeError: shape '[1, 9, 8, 128]' is invalid for input of size 18432

me too ，Have you solved it

404835993 · 2024-08-26T01:38:33Z

Did any one solved this?

beleon · 2024-08-31T12:46:12Z

EDIT: My test run finished without error and returned expected results.

So, I have a potential quick fix. My test hasn't gone through yet, but it has already been running for quite a while. I didn't dig into why exactly this is, but config.num_key_value_heads in .venv/lib/python3.12/site-packages/transformers/models/llama/modeling_llama.py is exactly half as large as it should be.

so in .venv/lib/python3.12/site-packages/transformers/models/llama/modeling_llama.py (for me its on line 288) replace

        self.num_key_value_heads = config.num_key_value_heads

with

        self.num_key_value_heads = config.num_key_value_heads * 2

This seems to make it work for this specific airllm setup for me. However it might mess up some other use case as I suspect the error originates elsewhere and the wrong value is just passed to this file.

beleon mentioned this issue Aug 31, 2024

RuntimeError: shape '[1, 5, 8, 128]' is invalid for input of size 10240 LLama 405B 4-bit on Layer 1 #178

Open

tripathiarpan20 mentioned this issue Sep 2, 2024

unsloth/Meta-Llama-3.1-405B-Instruct-bnb-4bit #180

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RuntimeError: shape '[1, 13, 8, 128]' is invalid for input of size 26624 #172

RuntimeError: shape '[1, 13, 8, 128]' is invalid for input of size 26624 #172

zhuojun1024 commented Aug 14, 2024

sgjohnson1981 commented Aug 17, 2024

JuergenMutschall commented Aug 20, 2024 •

edited

Loading

404835993 commented Aug 22, 2024 •

edited

Loading

1272870698 commented Aug 23, 2024

404835993 commented Aug 26, 2024

beleon commented Aug 31, 2024 •

edited

Loading

RuntimeError: shape '[1, 13, 8, 128]' is invalid for input of size 26624 #172

RuntimeError: shape '[1, 13, 8, 128]' is invalid for input of size 26624 #172

Comments

zhuojun1024 commented Aug 14, 2024

I encountered an error while trying to run Llama3.1 405B according to the documentation. Can you help me identify the problem?

error message

code

env

pip list

sgjohnson1981 commented Aug 17, 2024

JuergenMutschall commented Aug 20, 2024 • edited Loading

404835993 commented Aug 22, 2024 • edited Loading

1272870698 commented Aug 23, 2024

404835993 commented Aug 26, 2024

beleon commented Aug 31, 2024 • edited Loading

JuergenMutschall commented Aug 20, 2024 •

edited

Loading

404835993 commented Aug 22, 2024 •

edited

Loading

beleon commented Aug 31, 2024 •

edited

Loading