Reimplementation training not stable #33

xiankgx · 2020-09-25T01:30:12Z

Dear @seoungwugoh , I've read your paper and found your work extremely interesting. I've been trying to reproduce the work according to your paper, with some minor changes, like decoder layers and such. The memory read operation which is very much like transformer's attention mechanism is taken from this repo. Others, all reimplemented according to your paper's description.

I've been trying to train the model, loss goes down initially, and after a while it suddenly shoots up. I've tried:

clipping the gradient norm;
lowering learning rate;
removing skip connections (to make sure model actually tries to make use of temporal information (memory))

I've not tried disabling the batch norm as your paper suggests; and I'm using mixed precision training with Apex AMP.

Have you experienced such training instability before? What do you think could be the problem?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reimplementation training not stable #33

Reimplementation training not stable #33

xiankgx commented Sep 25, 2020

Reimplementation training not stable #33

Reimplementation training not stable #33

Comments

xiankgx commented Sep 25, 2020