Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Train Error #23

Open
bruicecode opened this issue Mar 5, 2024 · 3 comments
Open

Train Error #23

bruicecode opened this issue Mar 5, 2024 · 3 comments

Comments

@bruicecode
Copy link

bruicecode commented Mar 5, 2024

(venv) personalinfo@MacBook-Pro-3 LongNet % python3 train.py
2024-03-05 23:56:10,524 - numexpr.utils - INFO - NumExpr defaulting to 8 threads.
2024-03-05 23:56:17.908409: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
/Users/personalinfo/LongNet/venv/lib/python3.9/site-packages/transformers/utils/generic.py:441: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
_torch_pytree._register_pytree_node(
/Users/personalinfo/LongNet/venv/lib/python3.9/site-packages/transformers/utils/generic.py:309: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
_torch_pytree._register_pytree_node(
Using StableAdamWUnfused-v1
training: 0%| | 0/100000 [00:01<?, ?it/s]

Traceback (most recent call last):
  File "/Users/personalinfo/LongNet/train.py", line 84, in <module>
    loss = model(next(train_loader))
  File "/Users/personalinfo/LongNet/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/Users/personalinfo/LongNet/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/personalinfo/LongNet/long_net/model.py", line 356, in forward
    logits = self.net(x_inp, **kwargs)
  File "/Users/personalinfo/LongNet/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/Users/personalinfo/LongNet/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/personalinfo/LongNet/long_net/model.py", line 302, in forward
    x = self.transformer(x)
  File "/Users/personalinfo/LongNet/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/Users/personalinfo/LongNet/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/personalinfo/LongNet/long_net/model.py", line 271, in forward
    x = block(x) + x
RuntimeError: The size of tensor a (4128) must match the size of tensor b (8196) at non-singleton dimension 1
(venv) personalinfo@MacBook-Pro-3 LongNet % python3 train.py
2024-03-06 00:09:22,364 - numexpr.utils - INFO - NumExpr defaulting to 8 threads.
2024-03-06 00:09:27.673362: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
/Users/personalinfo/LongNet/venv/lib/python3.9/site-packages/transformers/utils/generic.py:441: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  _torch_pytree._register_pytree_node(
/Users/personalinfo/LongNet/venv/lib/python3.9/site-packages/transformers/utils/generic.py:309: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
  _torch_pytree._register_pytree_node(
Using StableAdamWUnfused-v1
training:   0%|                                                                                                                                | 0/100000 [00:01<?, ?it/s]
Traceback (most recent call last):
  File "/Users/personalinfo/LongNet/train.py", line 84, in <module>
    loss = model(next(train_loader))
  File "/Users/personalinfo/LongNet/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/Users/personalinfo/LongNet/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/personalinfo/LongNet/long_net/model.py", line 356, in forward
    logits = self.net(x_inp, **kwargs)
  File "/Users/personalinfo/LongNet/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/Users/personalinfo/LongNet/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/personalinfo/LongNet/long_net/model.py", line 302, in forward
    x = self.transformer(x)
  File "/Users/personalinfo/LongNet/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1511, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
  File "/Users/personalinfo/LongNet/venv/lib/python3.9/site-packages/torch/nn/modules/module.py", line 1520, in _call_impl
    return forward_call(*args, **kwargs)
  File "/Users/personalinfo/LongNet/long_net/model.py", line 271, in forward
    x = block(x) + x
RuntimeError: The size of tensor a (4128) must match the size of tensor b (8196) at non-singleton dimension 1

After setting up the environment, I ran 'python3 train.py' and this happened. Can you have a check? Thank you!

Upvote & Fund

  • We're using Polar.sh so you can upvote and help fund this issue.
  • We receive the funding once the issue is completed & confirmed by you.
  • Thank you in advance for helping prioritize & fund our backlog.
Fund with Polar
@bruicecode
Copy link
Author

I ran this script in PyCharm's venv environment.

@bruicecode
Copy link
Author

bruicecode commented Mar 7, 2024

Can you help me solve this question? I think the attention part has some problem since when I run 'pytest tests/attention.py' , error occurs like this: @kyegomez

(myenv2) mg@ubuntu:~/LongNet$ pytest tests/attention.py
======================================================= test session starts ========================================================
platform linux -- Python 3.9.18, pytest-7.4.2, pluggy-1.4.0
rootdir: /home/mg/LongNet
plugins: anyio-4.3.0, time-machine-2.14.0, hydra-core-1.0.7
collected 12 items                                                                                                                 

tests/attention.py .FFFF.FF...F                                                                                              [100%]

============================================================= FAILURES =============================================================
_________________________________________ TestDilatedAttention.test_attention_distribution _________________________________________

self = <attention.TestDilatedAttention testMethod=test_attention_distribution>

    def test_attention_distribution(self):
        input_tensor = torch.randn(2, 128, 512)
        dilated_attention = DilatedAttention(512, 8, 2, 64)
        _, attn_weights = dilated_attention(input_tensor)
    
>       self.assertTrue(
            torch.allclose(attn_weights.sum(dim=-1), torch.tensor(1.0))
        )
E       AssertionError: False is not true

tests/attention.py:114: AssertionError
___________________________________________ TestDilatedAttention.test_attention_outputs ____________________________________________

self = <attention.TestDilatedAttention testMethod=test_attention_outputs>

    def test_attention_outputs(self):
>       output = self.sparse_dilated_attention(self.x)
E       AttributeError: 'TestDilatedAttention' object has no attribute 'sparse_dilated_attention'

tests/attention.py:151: AttributeError
________________________________________________ TestDilatedAttention.test_dropout _________________________________________________

self = <attention.TestDilatedAttention testMethod=test_dropout>

    def test_dropout(self):
>       self.sparse_dilated_attention.dropout.p = 1.0
E       AttributeError: 'TestDilatedAttention' object has no attribute 'sparse_dilated_attention'

tests/attention.py:156: AttributeError
______________________________________________ TestDilatedAttention.test_forward_pass ______________________________________________

self = <attention.TestDilatedAttention testMethod=test_forward_pass>

    def test_forward_pass(self):
>       output = self.sparse_dilated_attention(self.x)
E       AttributeError: 'TestDilatedAttention' object has no attribute 'sparse_dilated_attention'

tests/attention.py:145: AttributeError
______________________________________________ TestDilatedAttention.test_output_shape ______________________________________________

self = <attention.TestDilatedAttention testMethod=test_output_shape>

    def test_output_shape(self):
        # Setup
        input_tensor = torch.randn(2, 128, 512)
        dilated_attention = DilatedAttention(512, 8, 2, 64)
    
        # Action
        output = dilated_attention(input_tensor)
    
        # Assert
>       self.assertEqual(output.shape, (2, 128, 512))
E       AssertionError: torch.Size([2, 64, 512]) != (2, 128, 512)

tests/attention.py:18: AssertionError
_________________________________________ TestDilatedAttention.test_relative_position_bias _________________________________________

self = <attention.TestDilatedAttention testMethod=test_relative_position_bias>

    def test_relative_position_bias(self):
        # Setup
        input_tensor = torch.randn(2, 128, 512)
        dilated_attention = DilatedAttention(
            512, 8, 2, 64, use_rel_pos_bias=True
        )
    
        # Action
>       output = dilated_attention(input_tensor)

tests/attention.py:39: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../anaconda3/envs/myenv2/lib/python3.9/site-packages/torch/nn/modules/module.py:1511: in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
../anaconda3/envs/myenv2/lib/python3.9/site-packages/torch/nn/modules/module.py:1520: in _call_impl
    return forward_call(*args, **kwargs)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

self = DilatedAttention(
  (dropout): Dropout(p=0.0, inplace=False)
  (attention): FlashAttention(
    (attn_dropout): Dropou...Linear(in_features=512, out_features=512, bias=True)
  (proj_v): Linear(in_features=512, out_features=512, bias=True)
)
x = tensor([[[[ 1.4205e+00, -4.5398e-01,  9.8770e-01,  ..., -2.6991e-02,
           -1.1310e+00,  1.4456e-03],
          [...231e-02],
          [ 3.7678e-01, -1.1879e-01,  2.9864e-01,  ..., -5.8582e-02,
           -3.6311e-01,  7.1331e-01]]]])

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        """Forward pass of the DilatedAttention module.
    
        Args:
            x (torch.Tensor): The input tensor.
    
        Returns:
            torch.Tensor: The output tensor.
        """
        batch_size, seq_len, _ = x.shape
        padding_len = -seq_len % self.segment_size
        x = F.pad(x, (0, 0, 0, padding_len))
        seq_len = seq_len + padding_len
    
        if self.use_xpos:
            x = self.xpos(x)
    
        # Split and sparsify
        x = x.view(batch_size, -1, self.segment_size, self.dim)
        x = x[:, :, :: self.dilation_rate, :]
    
        # qk_norm
        if self.qk_norm:
            q, k, v = map(
                self.norm, (self.proj_q(x), self.proj_k(x), self.proj_v(x))
            )
        else:
            q, k, v = self.proj_q(x), self.proj_k(x), self.proj_v(x)
    
        # Perform attention
        attn_output = self.attention(q, k, v)
    
        # if use rel pos => apply relative positioning bias
        if self.use_rel_pos_bias:
>           attn_output += self.relative_bias(
                batch_size, attn_output.size(1), attn_output.size(1)
            )
E           RuntimeError: The size of tensor a (512) must match the size of tensor b (2) at non-singleton dimension 3

../anaconda3/envs/myenv2/lib/python3.9/site-packages/long_net/attention.py:123: RuntimeError
__________________________________________________ TestDilatedAttention.test_xpos __________________________________________________

self = <attention.TestDilatedAttention testMethod=test_xpos>

    def test_xpos(self):
        # Setup
        input_tensor = torch.randn(2, 128, 512)
        dilated_attention = DilatedAttention(512, 8, 2, 64, use_xpos=True)
    
        # Action
>       output = dilated_attention(input_tensor)

tests/attention.py:26: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../anaconda3/envs/myenv2/lib/python3.9/site-packages/torch/nn/modules/module.py:1511: in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
../anaconda3/envs/myenv2/lib/python3.9/site-packages/torch/nn/modules/module.py:1520: in _call_impl
    return forward_call(*args, **kwargs)
../anaconda3/envs/myenv2/lib/python3.9/site-packages/long_net/attention.py:104: in forward
    x = self.xpos(x)
../anaconda3/envs/myenv2/lib/python3.9/site-packages/torch/nn/modules/module.py:1511: in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
../anaconda3/envs/myenv2/lib/python3.9/site-packages/torch/nn/modules/module.py:1520: in _call_impl
    return forward_call(*args, **kwargs)
../anaconda3/envs/myenv2/lib/python3.9/site-packages/long_net/utils.py:254: in forward
    x = apply_rotary_pos_emb(x, sin, cos, scale)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

x = tensor([[[-0.1896,  0.1342, -0.3958,  ..., -0.4303, -0.2086,  0.4187],
         [-0.2131,  0.3323,  0.4395,  ..., -1.5...59,  0.2221,  ...,  1.7753, -1.4079,  1.2502],
         [ 1.5155, -1.4299,  0.4873,  ..., -1.0910, -0.7816, -0.7960]]])
sin = tensor([[ 0.0000e+00,  0.0000e+00,  0.0000e+00,  ...,  0.0000e+00,
          0.0000e+00,  0.0000e+00],
        [ 9.817...,  1.6756e-02],
        [ 8.3368e-01,  8.3368e-01,  7.2268e-01,  ...,  2.2456e-02,
          1.6888e-02,  1.6888e-02]])
cos = tensor([[ 1.1695,  1.1695,  1.1586,  ...,  1.0057,  1.0028,  1.0028],
        [ 0.6304,  0.6304,  0.8459,  ...,  1.005...8111,  0.8425,  ...,  0.9942,  0.9971,  0.9971],
        [ 0.1992,  0.1992,  0.4756,  ...,  0.9941,  0.9971,  0.9971]])
scale = tensor([[1.1695, 1.1586, 1.1485,  ..., 1.0087, 1.0057, 1.0028],
        [1.1667, 1.1559, 1.1460,  ..., 1.0086, 1.0056,...0.8592, 0.8671, 0.8745,  ..., 0.9916, 0.9945, 0.9973],
        [0.8571, 0.8651, 0.8726,  ..., 0.9915, 0.9944, 0.9972]])

    def apply_rotary_pos_emb(x, sin, cos, scale=1):
        sin, cos = map(lambda t: duplicate_interleave(t * scale), (sin, cos))
        # einsum notation for lambda t: repeat(t[offset:x.shape[1]+offset,:], "n d -> () n () (d j)", j=2)
>       return (x * cos) + (rotate_every_two(x) * sin)
E       RuntimeError: The size of tensor a (512) must match the size of tensor b (64) at non-singleton dimension 2

../anaconda3/envs/myenv2/lib/python3.9/site-packages/long_net/utils.py:221: RuntimeError
========================================================= warnings summary =========================================================
../anaconda3/envs/myenv2/lib/python3.9/site-packages/transformers/utils/generic.py:441
  /home/mg/anaconda3/envs/myenv2/lib/python3.9/site-packages/transformers/utils/generic.py:441: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
    _torch_pytree._register_pytree_node(

../anaconda3/envs/myenv2/lib/python3.9/site-packages/transformers/utils/generic.py:309
  /home/mg/anaconda3/envs/myenv2/lib/python3.9/site-packages/transformers/utils/generic.py:309: UserWarning: torch.utils._pytree._register_pytree_node is deprecated. Please use torch.utils._pytree.register_pytree_node instead.
    _torch_pytree._register_pytree_node(

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
===================================================== short test summary info ======================================================
FAILED tests/attention.py::TestDilatedAttention::test_attention_distribution - AssertionError: False is not true
FAILED tests/attention.py::TestDilatedAttention::test_attention_outputs - AttributeError: 'TestDilatedAttention' object has no attribute 'sparse_dilated_attention'
FAILED tests/attention.py::TestDilatedAttention::test_dropout - AttributeError: 'TestDilatedAttention' object has no attribute 'sparse_dilated_attention'
FAILED tests/attention.py::TestDilatedAttention::test_forward_pass - AttributeError: 'TestDilatedAttention' object has no attribute 'sparse_dilated_attention'
FAILED tests/attention.py::TestDilatedAttention::test_output_shape - AssertionError: torch.Size([2, 64, 512]) != (2, 128, 512)
FAILED tests/attention.py::TestDilatedAttention::test_relative_position_bias - RuntimeError: The size of tensor a (512) must match the size of tensor b (2) at non-singleton dimension 3
FAILED tests/attention.py::TestDilatedAttention::test_xpos - RuntimeError: The size of tensor a (512) must match the size of tensor b (64) at non-singleton dimension 2
============================================= 7 failed, 5 passed, 2 warnings in 6.50s ==============================================

@Inzilbeth
Copy link

DilatedAttention is not working properly (the output shape is wrong). I'm having the same issue as well.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants