You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Dear @seoungwugoh , I've read your paper and found your work extremely interesting. I've been trying to reproduce the work according to your paper, with some minor changes, like decoder layers and such. The memory read operation which is very much like transformer's attention mechanism is taken from this repo. Others, all reimplemented according to your paper's description.
I've been trying to train the model, loss goes down initially, and after a while it suddenly shoots up. I've tried:
clipping the gradient norm;
lowering learning rate;
removing skip connections (to make sure model actually tries to make use of temporal information (memory))
I've not tried disabling the batch norm as your paper suggests; and I'm using mixed precision training with Apex AMP.
Have you experienced such training instability before? What do you think could be the problem?
The text was updated successfully, but these errors were encountered:
Dear @seoungwugoh , I've read your paper and found your work extremely interesting. I've been trying to reproduce the work according to your paper, with some minor changes, like decoder layers and such. The memory read operation which is very much like transformer's attention mechanism is taken from this repo. Others, all reimplemented according to your paper's description.
I've been trying to train the model, loss goes down initially, and after a while it suddenly shoots up. I've tried:
I've not tried disabling the batch norm as your paper suggests; and I'm using mixed precision training with Apex AMP.
Have you experienced such training instability before? What do you think could be the problem?
The text was updated successfully, but these errors were encountered: