Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The training is unstable #35

Open
shengkelong opened this issue Jun 24, 2021 · 2 comments
Open

The training is unstable #35

shengkelong opened this issue Jun 24, 2021 · 2 comments

Comments

@shengkelong
Copy link

Thank you for your impressive work. But when I try to recurrent this network(I rewrite the code myself), sometimes the loss will suddenly increase by 10 times. The structure of the network is correct because I can load the pretrained network, so I think there may be some details I didn't notice. Could you tell me what methods you have taken in training to ensure stability?

@pkuxmq
Copy link
Owner

pkuxmq commented Jun 24, 2021

We restrict the range on exp() and apply gradient clipping. For reasons please refer to #24.

@Feynman1999
Copy link

Feynman1999 commented Sep 11, 2023

Thank you for your impressive work. But when I try to recurrent this network(I rewrite the code myself), sometimes the loss will suddenly increase by 10 times. The structure of the network is correct because I can load the pretrained network, so I think there may be some details I didn't notice. Could you tell me what methods you have taken in training to ensure stability?

I have also encountered this issue recently, despite using gradient clipping strategy. How did you solve it?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants