Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

'''Train a simple deep CNN on the CIFAR10 small images dataset." after a while it gets worse #7384

Closed
ehfo0 opened this issue Jul 20, 2017 · 6 comments

Comments

@ehfo0
Copy link

ehfo0 commented Jul 20, 2017

I run the code it went high as 0.79% accuracy in 100 epoc but then it started to get worse and the loss goes back to what it started!(1.8) how to prevent that?

@mrTsjolder
Copy link
Contributor

Accuracy on the validation set or on the training set?

@ehfo0
Copy link
Author

ehfo0 commented Jul 21, 2017

these are the result after 200 epoc:
loss: 1.2848 - acc: 0.5930 - val_loss: 1.3541 - val_acc: 0.5902
but these were the result after 80 epoc:
loss: 0.7646 acc: 0.7389 - val_loss: 0.6494 - val_acc: 0.7852

@mrTsjolder
Copy link
Contributor

Are you running on theano, tensorflow or cntk backend?
What device are you running the computations on?
Did you use the data-augmentation?
Did you change anything in the code?

@ehfo0
Copy link
Author

ehfo0 commented Jul 21, 2017

tensorflow 1.2
laptop geforce gt970x
16 gig Ram
but it says

Total memory: 3.00GiB
Free memory: 2.48GiB

and no I didn't change anything in the code

@mrTsjolder
Copy link
Contributor

mrTsjolder commented Jul 21, 2017

I started a run using the Theano backend and I am getting similar results.
I do not see any problems in the code of the example and the implementation of RMSProp looks fine to me as well, so it might just be a problem that is related to RMSProp.

From the code, it seems that this might be caused by vanishing gradients, because if g ≈ 0, new_a will converge to 0.9 ** iterations, which gets already quite close to the machine precision (≈1.19e-7 for 32 bit floats) for iterations = 100. Dividing through this small number, might then lead to numerical instabilities that cause such issues.

This being said, I just realised that the default value for the numerical constantepsilon = 1e-8 is useless when using 32 bit floats. Maybe a better choice for epsilon might resolve this issue.

Maybe you can try another optimizer to see if this is indeed a problem with rmsprop. However, it seems that all adaptive learning rate optimizers use the same epsilon that might just cause the same problems. Therefore, it might be wiser to just choose epsilon to be something like 1e-7 or maybe even 2e-7.

PS: I found this similar issue for torch

@stale
Copy link

stale bot commented Oct 19, 2017

This issue has been automatically marked as stale because it has not had recent activity. It will be closed after 30 days if no further activity occurs, but feel free to re-open a closed issue if needed.

@stale stale bot added the stale label Oct 19, 2017
@stale stale bot closed this as completed Nov 18, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants