Problem with sparse activations #17

zylm · 2020-06-28T14:08:36Z

I just replace the sotfmax function with sparsemax function or tsallis15 function in my transformer model. It works well on training stage, but the following errors occur during the testing phase:
RuntimeError: CUDA error: device-side assert triggered

If I replace it with softmax function again, it works.

What could be the cause?

bpopeters · 2020-06-28T14:21:38Z

Are you using the most recent version of the code? We changed the name from tsallis15 to entmax15 not long after we released it. I would recommend updating and seeing if the bug goes away.

zylm · 2020-06-29T00:46:28Z

I used both the most recent version of entmax package and the function used in openNMT-entmax. It all does not work. I don't know why.

bpopeters · 2020-06-29T00:59:47Z

Do you get a different error message if you run the code on the cpu?

Are you using entmax for attention or the loss function?

zylm · 2020-06-29T01:32:20Z

Thank you very much for your reply.
I think the problem is that the mask of attention is filled with -np.inf.
But, why it works in the training stage.....? I am confused.

bpopeters · 2020-06-29T11:05:48Z

Could you post more details about the error?

Where in the code does it occur?
Do you also get an error with cpu tensors?
What version of torch are you using?

zylm · 2020-06-30T00:49:49Z

The error occurs in the decoder self-attention and context attention. It occurs because of the beamsearch algorithm. I use the this transformer code, you can try it. When using beamsearch, it appears [nan,nan,nan] in the first decode step, softmax function can ignore it, but sparse function can not.

Do your have any suggestion to fix this problem?

bpopeters · 2020-07-01T14:30:39Z

Unfortunately I'm not familiar with that transformer implementation. Can you find out what the inputs and outputs of entmax are when you get nans?

I'd guess that target masking (different between training and beam search time) is producing tensors that entmax is struggling with. But I can't fix the problem until I know what these tensors look like.

zylm · 2020-07-02T00:29:57Z

The input is like:
attn = torch.Tensor([1,2,3],[float('nan'),float('nan'),float('nan')])
the tensor is with nan after attention mask.
softmax(attn) return
tensor([[0.0900,0.2447,0.6652],
[nan, nan, nan]])
but sparse function return errors:
Traceback (most recent call last):
File "/home/xxx/anaconda2/envs/python36/lib/python3.6/site-packages/IPython/core/interactiveshell.py", line 3326, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "", line 1, in
sparsemax(attn)
File "/home/xxx/anaconda2/envs/python36/lib/python3.6/site-packages/entmax/activations.py", line 223, in sparsemax
return SparsemaxFunction.apply(X, dim, k)
File "/home/xxx/anaconda2/envs/python36/lib/python3.6/site-packages/entmax/activations.py", line 151, in forward
tau, supp_size = _sparsemax_threshold_and_support(X, dim=dim, k=k)
File "/home/xxx/anaconda2/envs/python36/lib/python3.6/site-packages/entmax/activations.py", line 72, in _sparsemax_threshold_and_support
tau = topk_cumsum.gather(dim, support_size - 1)
RuntimeError: Invalid index in gather at /opt/conda/conda-bld/pytorch_1556653183467/work/aten/src/TH/generic/THTensorEvenMoreMath.cpp:459

bpopeters · 2020-07-02T01:12:31Z

I reproduced your error message on my machine. Softmax can handle an input of all nans, entmax currently cannot.

However, I'm skeptical of the circumstances that caused this situation to arise. Does the code you're using intentionally create tensors with nans in them for masking? I find that surprising. nan should show up basically only if the code is doing something wrong -- introducing it on purpose makes debugging harder. Neither of the transformer implementations I'm familiar with (OpenNMT, joeynmt) use nans for masking like this (for what it's worth, I looked through the repo you linked and can't find nans there either -- are you sure they're there on purpose?).

So while I agree that entmax crashes on nans, I'm not convinced this is a bad thing -- if something is broken, it's better to crash than fail silently.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Problem with sparse activations #17

Problem with sparse activations #17

zylm commented Jun 28, 2020

bpopeters commented Jun 28, 2020

zylm commented Jun 29, 2020

bpopeters commented Jun 29, 2020

zylm commented Jun 29, 2020

bpopeters commented Jun 29, 2020

zylm commented Jun 30, 2020

bpopeters commented Jul 1, 2020

zylm commented Jul 2, 2020

bpopeters commented Jul 2, 2020

Problem with sparse activations #17

Problem with sparse activations #17

Comments

zylm commented Jun 28, 2020

bpopeters commented Jun 28, 2020

zylm commented Jun 29, 2020

bpopeters commented Jun 29, 2020

zylm commented Jun 29, 2020

bpopeters commented Jun 29, 2020

zylm commented Jun 30, 2020

bpopeters commented Jul 1, 2020

zylm commented Jul 2, 2020

bpopeters commented Jul 2, 2020