Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reimplementation training not stable #33

Open
xiankgx opened this issue Sep 25, 2020 · 0 comments
Open

Reimplementation training not stable #33

xiankgx opened this issue Sep 25, 2020 · 0 comments

Comments

@xiankgx
Copy link

xiankgx commented Sep 25, 2020

Dear @seoungwugoh , I've read your paper and found your work extremely interesting. I've been trying to reproduce the work according to your paper, with some minor changes, like decoder layers and such. The memory read operation which is very much like transformer's attention mechanism is taken from this repo. Others, all reimplemented according to your paper's description.

I've been trying to train the model, loss goes down initially, and after a while it suddenly shoots up. I've tried:

  • clipping the gradient norm;
  • lowering learning rate;
  • removing skip connections (to make sure model actually tries to make use of temporal information (memory))

I've not tried disabling the batch norm as your paper suggests; and I'm using mixed precision training with Apex AMP.

Have you experienced such training instability before? What do you think could be the problem?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant