The training is unstable #35

shengkelong · 2021-06-24T03:14:30Z

Thank you for your impressive work. But when I try to recurrent this network(I rewrite the code myself), sometimes the loss will suddenly increase by 10 times. The structure of the network is correct because I can load the pretrained network, so I think there may be some details I didn't notice. Could you tell me what methods you have taken in training to ensure stability?

pkuxmq · 2021-06-24T09:13:35Z

We restrict the range on exp() and apply gradient clipping. For reasons please refer to #24.

Feynman1999 · 2023-09-11T02:24:33Z

Thank you for your impressive work. But when I try to recurrent this network(I rewrite the code myself), sometimes the loss will suddenly increase by 10 times. The structure of the network is correct because I can load the pretrained network, so I think there may be some details I didn't notice. Could you tell me what methods you have taken in training to ensure stability?

I have also encountered this issue recently, despite using gradient clipping strategy. How did you solve it?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The training is unstable #35

The training is unstable #35

shengkelong commented Jun 24, 2021

pkuxmq commented Jun 24, 2021

Feynman1999 commented Sep 11, 2023 •

edited

Loading

The training is unstable #35

The training is unstable #35

Comments

shengkelong commented Jun 24, 2021

pkuxmq commented Jun 24, 2021

Feynman1999 commented Sep 11, 2023 • edited Loading

Feynman1999 commented Sep 11, 2023 •

edited

Loading