Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aborting, cost seems to be exploding. #4

Open
pannous opened this issue Jan 8, 2015 · 4 comments
Open

Aborting, cost seems to be exploding. #4

pannous opened this issue Jan 8, 2015 · 4 comments

Comments

@pannous
Copy link

pannous commented Jan 8, 2015

training with flickr8k aborts:

253/15000 batch done in 5.037s. at epoch 0.84. loss cost = 37.447347, reg cost = 0.000001, ppl2 = 26.10 (smooth 48.09)
254/15000 batch done in 5.082s. at epoch 0.85. loss cost = 39.408169, reg cost = 0.000001, ppl2 = 29.19 (smooth 47.91)
255/15000 batch done in 4.914s. at epoch 0.85. loss cost = 140.730310, reg cost = 0.000001, ppl2 = 237360.65 (smooth 2421.03)
Aboring, cost seems to be exploding. Run gradcheck? Lower the learning rate?

@karpathy
Copy link
Owner

karpathy commented Jan 8, 2015

With default parameters? I Thoguht I tuned them so that this doesn't happen, sorry about that. As the message suggests, lowering the learning rate does it. Set learning_rate to be about half or fifth of what it is now, until it doesn't explode :)

@StevenLOL
Copy link

Here is my result on the default setting:

python driver.py
parsed parameters:
{
"grad_clip": 5,
"rnn_relu_encoders": 0,
"dataset": "flickr8k",
"image_encoding_size": 256,
"eval_max_images": -1,
"drop_prob_decoder": 0.5,
"word_encoding_size": 256,
"max_epochs": 50,
"eval_batch_size": 100,
"fappend": "baseline",
"generator": "lstm",
"write_checkpoint_ppl_threshold": -1,
"decay_rate": 0.999,
"tanhC_version": 0,
"hidden_size": 256,
"momentum": 0.0,
"worker_status_output_directory": "status/",
"learning_rate": 0.001,
"checkpoint_output_directory": "cv/",
"do_grad_check": 0,
"word_count_threshold": 5,
"batch_size": 100,
"regc": 1e-08,
"smooth_eps": 1e-08,
"solver": "rmsprop",
"eval_period": 1.0,
"drop_prob_encoder": 0.5
}
Initializing data provider for dataset flickr8k...
BasicDataProvider: reading data/flickr8k/dataset.json
BasicDataProvider: reading data/flickr8k/vgg_feats.mat
preprocessing word counts and creating vocab based on word count threshold 5

253/15000 batch done in 3.242s. at epoch 0.84. loss cost = 39.264201, reg cost = 0.000001, ppl2 = 29.60 (smooth 47.89)
254/15000 batch done in 3.133s. at epoch 0.85. loss cost = 39.633654, reg cost = 0.000001, ppl2 = 33.57 (smooth 47.74)
255/15000 batch done in 3.169s. at epoch 0.85. loss cost = 38.571550, reg cost = 0.000001, ppl2 = 29.56 (smooth 47.56)

.
.
.
.
.
one day later...
.

14999/15000 batch done in 3.492s. at epoch 50.00. loss cost = 28.621228, reg cost = 0.000004, ppl2 = 11.19 (smooth 10.80)
evaluating val performance in batches of 100
evaluated 5000 sentences and got perplexity = 17.785250
validation perplexity = 17.785250

@karpathy
Copy link
Owner

@StevenLOL Nice! Looking at the Model Zoo,
http://cs.stanford.edu/people/karpathy/neuraltalk/

my LSTM model achieves perplexity of about 15.7 (which is slightly better). I ran it for longer and cross-validated it on our cluster, though.

@pannous
Copy link
Author

pannous commented Jan 10, 2015

Thanks I will try again with reduced learning rate

On Jan 10, 2015, at 10:59 AM, Steven [email protected] wrote:

Here is my result on the default setting:

python driver.py
parsed parameters:
{
"grad_clip": 5,
"rnn_relu_encoders": 0,
"dataset": "flickr8k",
"image_encoding_size": 256,
"eval_max_images": -1,
"drop_prob_decoder": 0.5,
"word_encoding_size": 256,
"max_epochs": 50,
"eval_batch_size": 100,
"fappend": "baseline",
"generator": "lstm",
"write_checkpoint_ppl_threshold": -1,
"decay_rate": 0.999,
"tanhC_version": 0,
"hidden_size": 256,
"momentum": 0.0,
"worker_status_output_directory": "status/",
"learning_rate": 0.001,
"checkpoint_output_directory": "cv/",
"do_grad_check": 0,
"word_count_threshold": 5,
"batch_size": 100,
"regc": 1e-08,
"smooth_eps": 1e-08,
"solver": "rmsprop",
"eval_period": 1.0,
"drop_prob_encoder": 0.5
}
Initializing data provider for dataset flickr8k...
BasicDataProvider: reading data/flickr8k/dataset.json
BasicDataProvider: reading data/flickr8k/vgg_feats.mat
preprocessing word counts and creating vocab based on word count threshold 5

253/15000 batch done in 3.242s. at epoch 0.84. loss cost = 39.264201, reg cost = 0.000001, ppl2 = 29.60 (smooth 47.89)
254/15000 batch done in 3.133s. at epoch 0.85. loss cost = 39.633654, reg cost = 0.000001, ppl2 = 33.57 (smooth 47.74)
255/15000 batch done in 3.169s. at epoch 0.85. loss cost = 38.571550, reg cost = 0.000001, ppl2 = 29.56 (smooth 47.56)

.
.
.
.
.
one day later...
.

14999/15000 batch done in 3.492s. at epoch 50.00. loss cost = 28.621228, reg cost = 0.000004, ppl2 = 11.19 (smooth 10.80)
evaluating val performance in batches of 100
evaluated 5000 sentences and got perplexity = 17.785250
validation perplexity = 17.785250


Reply to this email directly or view it on GitHub.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants