Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problem with sparse activations #17

Open
zylm opened this issue Jun 28, 2020 · 9 comments
Open

Problem with sparse activations #17

zylm opened this issue Jun 28, 2020 · 9 comments

Comments

@zylm
Copy link

zylm commented Jun 28, 2020

I just replace the sotfmax function with sparsemax function or tsallis15 function in my transformer model. It works well on training stage, but the following errors occur during the testing phase:
RuntimeError: CUDA error: device-side assert triggered

If I replace it with softmax function again, it works.

What could be the cause?

@bpopeters
Copy link
Collaborator

Are you using the most recent version of the code? We changed the name from tsallis15 to entmax15 not long after we released it. I would recommend updating and seeing if the bug goes away.

@zylm
Copy link
Author

zylm commented Jun 29, 2020

I used both the most recent version of entmax package and the function used in openNMT-entmax. It all does not work. I don't know why.

@bpopeters
Copy link
Collaborator

Do you get a different error message if you run the code on the cpu?

Are you using entmax for attention or the loss function?

@zylm
Copy link
Author

zylm commented Jun 29, 2020

Thank you very much for your reply.
I think the problem is that the mask of attention is filled with -np.inf.
But, why it works in the training stage.....? I am confused.

@bpopeters
Copy link
Collaborator

Could you post more details about the error?

  • Where in the code does it occur?
  • Do you also get an error with cpu tensors?
  • What version of torch are you using?

@zylm
Copy link
Author

zylm commented Jun 30, 2020

The error occurs in the decoder self-attention and context attention. It occurs because of the beamsearch algorithm. I use the this transformer code, you can try it. When using beamsearch, it appears [nan,nan,nan] in the first decode step, softmax function can ignore it, but sparse function can not.

Do your have any suggestion to fix this problem?

@bpopeters
Copy link
Collaborator

Unfortunately I'm not familiar with that transformer implementation. Can you find out what the inputs and outputs of entmax are when you get nans?

I'd guess that target masking (different between training and beam search time) is producing tensors that entmax is struggling with. But I can't fix the problem until I know what these tensors look like.

@zylm
Copy link
Author

zylm commented Jul 2, 2020

The input is like:
attn = torch.Tensor([1,2,3],[float('nan'),float('nan'),float('nan')])
the tensor is with nan after attention mask.
softmax(attn) return
tensor([[0.0900,0.2447,0.6652],
[nan, nan, nan]])
but sparse function return errors:
Traceback (most recent call last):
File "/home/xxx/anaconda2/envs/python36/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 3326, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "", line 1, in
sparsemax(attn)
File "/home/xxx/anaconda2/envs/python36/lib/python3.6/site-packages/entmax/activations.py", line 223, in sparsemax
return SparsemaxFunction.apply(X, dim, k)
File "/home/xxx/anaconda2/envs/python36/lib/python3.6/site-packages/entmax/activations.py", line 151, in forward
tau, supp_size = _sparsemax_threshold_and_support(X, dim=dim, k=k)
File "/home/xxx/anaconda2/envs/python36/lib/python3.6/site-packages/entmax/activations.py", line 72, in _sparsemax_threshold_and_support
tau = topk_cumsum.gather(dim, support_size - 1)
RuntimeError: Invalid index in gather at /opt/conda/conda-bld/pytorch_1556653183467/work/aten/src/TH/generic/THTensorEvenMoreMath.cpp:459

@bpopeters
Copy link
Collaborator

I reproduced your error message on my machine. Softmax can handle an input of all nans, entmax currently cannot.

However, I'm skeptical of the circumstances that caused this situation to arise. Does the code you're using intentionally create tensors with nans in them for masking? I find that surprising. nan should show up basically only if the code is doing something wrong -- introducing it on purpose makes debugging harder. Neither of the transformer implementations I'm familiar with (OpenNMT, joeynmt) use nans for masking like this (for what it's worth, I looked through the repo you linked and can't find nans there either -- are you sure they're there on purpose?).

So while I agree that entmax crashes on nans, I'm not convinced this is a bad thing -- if something is broken, it's better to crash than fail silently.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants