Train Error #23

bruicecode · 2024-03-05T16:11:51Z

(venv) personalinfo@MacBook-Pro-3 LongNet % python3 train.py
2024-03-05 23:56:10,524 - numexpr.utils - INFO - NumExpr defaulting to 8 threads.
2024-03-05 23:56:17.908409: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
/Users/personalinfo/LongNet/venv/lib/python3.9/site-packages/transformers/utils/generic.py:441: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
_torch_pytree._register_pytree_node(
/Users/personalinfo/LongNet/venv/lib/python3.9/site-packages/transformers/utils/generic.py:309: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
_torch_pytree._register_pytree_node(
Using StableAdamWUnfused-v1
training: 0%| | 0/100000 [00:01<?, ?it/s]

Traceback (most recent call last):
  File "/Users/personalinfo/LongNet/train.py", line 84, in <module>
    loss = model(next(train_loader))
  File "/Users/personalinfo/LongNet/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/Users/personalinfo/LongNet/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/personalinfo/LongNet/long_net/model.py", line 356, in forward
    logits = self.net(x_inp, **kwargs)
  File "/Users/personalinfo/LongNet/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/Users/personalinfo/LongNet/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/personalinfo/LongNet/long_net/model.py", line 302, in forward
    x = self.transformer(x)
  File "/Users/personalinfo/LongNet/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/Users/personalinfo/LongNet/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/personalinfo/LongNet/long_net/model.py", line 271, in forward
    x = block(x) + x
RuntimeError: The size of tensor a (4128) must match the size of tensor b (8196) at non-singleton dimension 1
(venv) personalinfo@MacBook-Pro-3 LongNet % python3 train.py
2024-03-06 00:09:22,364 - numexpr.utils - INFO - NumExpr defaulting to 8 threads.
2024-03-06 00:09:27.673362: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
/Users/personalinfo/LongNet/venv/lib/python3.9/site-packages/transformers/utils/generic.py:441: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  _torch_pytree._register_pytree_node(
/Users/personalinfo/LongNet/venv/lib/python3.9/site-packages/transformers/utils/generic.py:309: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  _torch_pytree._register_pytree_node(
Using StableAdamWUnfused-v1
training:   0%|                                                                                                                                | 0/100000 [00:01<?, ?it/s]
Traceback (most recent call last):
  File "/Users/personalinfo/LongNet/train.py", line 84, in <module>
    loss = model(next(train_loader))
  File "/Users/personalinfo/LongNet/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/Users/personalinfo/LongNet/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/personalinfo/LongNet/long_net/model.py", line 356, in forward
    logits = self.net(x_inp, **kwargs)
  File "/Users/personalinfo/LongNet/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/Users/personalinfo/LongNet/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/personalinfo/LongNet/long_net/model.py", line 302, in forward
    x = self.transformer(x)
  File "/Users/personalinfo/LongNet/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/Users/personalinfo/LongNet/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/personalinfo/LongNet/long_net/model.py", line 271, in forward
    x = block(x) + x
RuntimeError: The size of tensor a (4128) must match the size of tensor b (8196) at non-singleton dimension 1

After setting up the environment, I ran 'python3 train.py' and this happened. Can you have a check? Thank you!

Upvote & Fund

We're using Polar.sh so you can upvote and help fund this issue.
We receive the funding once the issue is completed & confirmed by you.
Thank you in advance for helping prioritize & fund our backlog.

The text was updated successfully, but these errors were encountered:

bruicecode · 2024-03-05T16:12:36Z

I ran this script in PyCharm's venv environment.

bruicecode · 2024-03-07T14:44:11Z

Can you help me solve this question? I think the attention part has some problem since when I run 'pytest tests/attention.py' , error occurs like this: @kyegomez

(myenv2) mg@ubuntu:~/LongNet$ pytest tests/attention.py
======================================================= test session starts ========================================================
platform linux -- Python 3.9.18, pytest-7.4.2, pluggy-1.4.0
rootdir: /home/mg/LongNet
plugins: anyio-4.3.0, time-machine-2.14.0, hydra-core-1.0.7
collected 12 items                                                                                                                 

tests/attention.py .FFFF.FF...F                                                                                              [100%]

============================================================= FAILURES =============================================================
_________________________________________ TestDilatedAttention.test_attention_distribution _________________________________________

self = <attention.TestDilatedAttention testMethod=test_attention_distribution>

    def test_attention_distribution(self):
        input_tensor = torch.randn(2, 128, 512)
        dilated_attention = DilatedAttention(512, 8, 2, 64)
        _, attn_weights = dilated_attention(input_tensor)
    
>       self.assertTrue(
            torch.allclose(attn_weights.sum(dim=-1), torch.tensor(1.0))
        )
E       AssertionError: False is not true

tests/attention.py:114: AssertionError
___________________________________________ TestDilatedAttention.test_attention_outputs ____________________________________________

self = <attention.TestDilatedAttention testMethod=test_attention_outputs>

    def test_attention_outputs(self):
>       output = self.sparse_dilated_attention(self.x)
E       AttributeError: 'TestDilatedAttention' object has no attribute 'sparse_dilated_attention'

tests/attention.py:151: AttributeError
________________________________________________ TestDilatedAttention.test_dropout _________________________________________________

self = <attention.TestDilatedAttention testMethod=test_dropout>

    def test_dropout(self):
>       self.sparse_dilated_attention.dropout.p = 1.0
E       AttributeError: 'TestDilatedAttention' object has no attribute 'sparse_dilated_attention'

tests/attention.py:156: AttributeError
______________________________________________ TestDilatedAttention.test_forward_pass ______________________________________________

self = <attention.TestDilatedAttention testMethod=test_forward_pass>

    def test_forward_pass(self):
>       output = self.sparse_dilated_attention(self.x)
E       AttributeError: 'TestDilatedAttention' object has no attribute 'sparse_dilated_attention'

tests/attention.py:145: AttributeError
______________________________________________ TestDilatedAttention.test_output_shape ______________________________________________

self = <attention.TestDilatedAttention testMethod=test_output_shape>

    def test_output_shape(self):
        # Setup
        input_tensor = torch.randn(2, 128, 512)
        dilated_attention = DilatedAttention(512, 8, 2, 64)
    
        # Action
        output = dilated_attention(input_tensor)
    
        # Assert
>       self.assertEqual(output.shape, (2, 128, 512))
E       AssertionError: torch.Size([2, 64, 512]) != (2, 128, 512)

tests/attention.py:18: AssertionError
_________________________________________ TestDilatedAttention.test_relative_position_bias _________________________________________

self = <attention.TestDilatedAttention testMethod=test_relative_position_bias>

    def test_relative_position_bias(self):
        # Setup
        input_tensor = torch.randn(2, 128, 512)
        dilated_attention = DilatedAttention(
            512, 8, 2, 64, use_rel_pos_bias=True
        )
    
        # Action
>       output = dilated_attention(input_tensor)

tests/attention.py:39: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../anaconda3/envs/myenv2/lib/python3.9/site-packages/torch/nn/modules/module.py:1511: in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
../anaconda3/envs/myenv2/lib/python3.9/site-packages/torch/nn/modules/module.py:1520: in _call_impl
    return forward_call(*args, **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = DilatedAttention(
  (dropout): Dropout(p=0.0, inplace=False)
  (attention): FlashAttention(
    (attn_dropout): Dropou...Linear(in_features=512, out_features=512, bias=True)
  (proj_v): Linear(in_features=512, out_features=512, bias=True)
)
x = tensor([[[[ 1.4205e+00, -4.5398e-01,  9.8770e-01,  ..., -2.6991e-02,
           -1.1310e+00,  1.4456e-03],
          [...231e-02],
          [ 3.7678e-01, -1.1879e-01,  2.9864e-01,  ..., -5.8582e-02,
           -3.6311e-01,  7.1331e-01]]]])

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        """Forward pass of the DilatedAttention module.
    
        Args:
            x (torch.Tensor): The input tensor.
    
        Returns:
            torch.Tensor: The output tensor.
        """
        batch_size, seq_len, _ = x.shape
        padding_len = -seq_len % self.segment_size
        x = F.pad(x, (0, 0, 0, padding_len))
        seq_len = seq_len + padding_len
    
        if self.use_xpos:
            x = self.xpos(x)
    
        # Split and sparsify
        x = x.view(batch_size, -1, self.segment_size, self.dim)
        x = x[:, :, :: self.dilation_rate, :]
    
        # qk_norm
        if self.qk_norm:
            q, k, v = map(
                self.norm, (self.proj_q(x), self.proj_k(x), self.proj_v(x))
            )
        else:
            q, k, v = self.proj_q(x), self.proj_k(x), self.proj_v(x)
    
        # Perform attention
        attn_output = self.attention(q, k, v)
    
        # if use rel pos => apply relative positioning bias
        if self.use_rel_pos_bias:
>           attn_output += self.relative_bias(
                batch_size, attn_output.size(1), attn_output.size(1)
            )
E           RuntimeError: The size of tensor a (512) must match the size of tensor b (2) at non-singleton dimension 3

../anaconda3/envs/myenv2/lib/python3.9/site-packages/long_net/attention.py:123: RuntimeError
__________________________________________________ TestDilatedAttention.test_xpos __________________________________________________

self = <attention.TestDilatedAttention testMethod=test_xpos>

    def test_xpos(self):
        # Setup
        input_tensor = torch.randn(2, 128, 512)
        dilated_attention = DilatedAttention(512, 8, 2, 64, use_xpos=True)
    
        # Action
>       output = dilated_attention(input_tensor)

tests/attention.py:26: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../anaconda3/envs/myenv2/lib/python3.9/site-packages/torch/nn/modules/module.py:1511: in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
../anaconda3/envs/myenv2/lib/python3.9/site-packages/torch/nn/modules/module.py:1520: in _call_impl
    return forward_call(*args, **kwargs)
../anaconda3/envs/myenv2/lib/python3.9/site-packages/long_net/attention.py:104: in forward
    x = self.xpos(x)
../anaconda3/envs/myenv2/lib/python3.9/site-packages/torch/nn/modules/module.py:1511: in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
../anaconda3/envs/myenv2/lib/python3.9/site-packages/torch/nn/modules/module.py:1520: in _call_impl
    return forward_call(*args, **kwargs)
../anaconda3/envs/myenv2/lib/python3.9/site-packages/long_net/utils.py:254: in forward
    x = apply_rotary_pos_emb(x, sin, cos, scale)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

x = tensor([[[-0.1896,  0.1342, -0.3958,  ..., -0.4303, -0.2086,  0.4187],
         [-0.2131,  0.3323,  0.4395,  ..., -1.5...59,  0.2221,  ...,  1.7753, -1.4079,  1.2502],
         [ 1.5155, -1.4299,  0.4873,  ..., -1.0910, -0.7816, -0.7960]]])
sin = tensor([[ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ...,  0.0000e+00,
          0.0000e+00,  0.0000e+00],
        [ 9.817...,  1.6756e-02],
        [ 8.3368e-01,  8.3368e-01,  7.2268e-01,  ...,  2.2456e-02,
          1.6888e-02,  1.6888e-02]])
cos = tensor([[ 1.1695,  1.1695,  1.1586,  ...,  1.0057,  1.0028,  1.0028],
        [ 0.6304,  0.6304,  0.8459,  ...,  1.005...8111,  0.8425,  ...,  0.9942,  0.9971,  0.9971],
        [ 0.1992,  0.1992,  0.4756,  ...,  0.9941,  0.9971,  0.9971]])
scale = tensor([[1.1695, 1.1586, 1.1485,  ..., 1.0087, 1.0057, 1.0028],
        [1.1667, 1.1559, 1.1460,  ..., 1.0086, 1.0056,...0.8592, 0.8671, 0.8745,  ..., 0.9916, 0.9945, 0.9973],
        [0.8571, 0.8651, 0.8726,  ..., 0.9915, 0.9944, 0.9972]])

    def apply_rotary_pos_emb(x, sin, cos, scale=1):
        sin, cos = map(lambda t: duplicate_interleave(t * scale), (sin, cos))
        # einsum notation for lambda t: repeat(t[offset:x.shape[1]+offset,:], "n d -> () n () (d j)", j=2)
>       return (x * cos) + (rotate_every_two(x) * sin)
E       RuntimeError: The size of tensor a (512) must match the size of tensor b (64) at non-singleton dimension 2

../anaconda3/envs/myenv2/lib/python3.9/site-packages/long_net/utils.py:221: RuntimeError
========================================================= warnings summary =========================================================
../anaconda3/envs/myenv2/lib/python3.9/site-packages/transformers/utils/generic.py:441
  /home/mg/anaconda3/envs/myenv2/lib/python3.9/site-packages/transformers/utils/generic.py:441: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
    _torch_pytree._register_pytree_node(

../anaconda3/envs/myenv2/lib/python3.9/site-packages/transformers/utils/generic.py:309
  /home/mg/anaconda3/envs/myenv2/lib/python3.9/site-packages/transformers/utils/generic.py:309: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
    _torch_pytree._register_pytree_node(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
===================================================== short test summary info ======================================================
FAILED tests/attention.py::TestDilatedAttention::test_attention_distribution - AssertionError: False is not true
FAILED tests/attention.py::TestDilatedAttention::test_attention_outputs - AttributeError: 'TestDilatedAttention' object has no attribute 'sparse_dilated_attention'
FAILED tests/attention.py::TestDilatedAttention::test_dropout - AttributeError: 'TestDilatedAttention' object has no attribute 'sparse_dilated_attention'
FAILED tests/attention.py::TestDilatedAttention::test_forward_pass - AttributeError: 'TestDilatedAttention' object has no attribute 'sparse_dilated_attention'
FAILED tests/attention.py::TestDilatedAttention::test_output_shape - AssertionError: torch.Size([2, 64, 512]) != (2, 128, 512)
FAILED tests/attention.py::TestDilatedAttention::test_relative_position_bias - RuntimeError: The size of tensor a (512) must match the size of tensor b (2) at non-singleton dimension 3
FAILED tests/attention.py::TestDilatedAttention::test_xpos - RuntimeError: The size of tensor a (512) must match the size of tensor b (64) at non-singleton dimension 2
============================================= 7 failed, 5 passed, 2 warnings in 6.50s ==============================================

Inzilbeth · 2024-04-14T02:02:57Z

DilatedAttention is not working properly (the output shape is wrong). I'm having the same issue as well.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Train Error #23

Train Error #23

bruicecode commented Mar 5, 2024 •

edited by polar-sh bot

Loading

bruicecode commented Mar 5, 2024

bruicecode commented Mar 7, 2024 •

edited

Loading

Inzilbeth commented Apr 14, 2024

Train Error #23

Train Error #23

Comments

bruicecode commented Mar 5, 2024 • edited by polar-sh bot Loading

Upvote & Fund

bruicecode commented Mar 5, 2024

bruicecode commented Mar 7, 2024 • edited Loading

Inzilbeth commented Apr 14, 2024

bruicecode commented Mar 5, 2024 •

edited by polar-sh bot

Loading

bruicecode commented Mar 7, 2024 •

edited

Loading