Aborting, cost seems to be exploding. #4

pannous · 2015-01-08T13:18:45Z

training with flickr8k aborts:

253/15000 batch done in 5.037s. at epoch 0.84. loss cost = 37.447347, reg cost = 0.000001, ppl2 = 26.10 (smooth 48.09)
254/15000 batch done in 5.082s. at epoch 0.85. loss cost = 39.408169, reg cost = 0.000001, ppl2 = 29.19 (smooth 47.91)
255/15000 batch done in 4.914s. at epoch 0.85. loss cost = 140.730310, reg cost = 0.000001, ppl2 = 237360.65 (smooth 2421.03)
Aboring, cost seems to be exploding. Run gradcheck? Lower the learning rate?

karpathy · 2015-01-08T17:47:37Z

With default parameters? I Thoguht I tuned them so that this doesn't happen, sorry about that. As the message suggests, lowering the learning rate does it. Set learning_rate to be about half or fifth of what it is now, until it doesn't explode :)

StevenLOL · 2015-01-10T09:59:39Z

Here is my result on the default setting:

python driver.py
parsed parameters:
{
"grad_clip": 5,
"rnn_relu_encoders": 0,
"dataset": "flickr8k",
"image_encoding_size": 256,
"eval_max_images": -1,
"drop_prob_decoder": 0.5,
"word_encoding_size": 256,
"max_epochs": 50,
"eval_batch_size": 100,
"fappend": "baseline",
"generator": "lstm",
"write_checkpoint_ppl_threshold": -1,
"decay_rate": 0.999,
"tanhC_version": 0,
"hidden_size": 256,
"momentum": 0.0,
"worker_status_output_directory": "status/",
"learning_rate": 0.001,
"checkpoint_output_directory": "cv/",
"do_grad_check": 0,
"word_count_threshold": 5,
"batch_size": 100,
"regc": 1e-08,
"smooth_eps": 1e-08,
"solver": "rmsprop",
"eval_period": 1.0,
"drop_prob_encoder": 0.5
}
Initializing data provider for dataset flickr8k...
BasicDataProvider: reading data/flickr8k/dataset.json
BasicDataProvider: reading data/flickr8k/vgg_feats.mat
preprocessing word counts and creating vocab based on word count threshold 5

253/15000 batch done in 3.242s. at epoch 0.84. loss cost = 39.264201, reg cost = 0.000001, ppl2 = 29.60 (smooth 47.89)
254/15000 batch done in 3.133s. at epoch 0.85. loss cost = 39.633654, reg cost = 0.000001, ppl2 = 33.57 (smooth 47.74)
255/15000 batch done in 3.169s. at epoch 0.85. loss cost = 38.571550, reg cost = 0.000001, ppl2 = 29.56 (smooth 47.56)

.
.
.
.
.
one day later...
.

14999/15000 batch done in 3.492s. at epoch 50.00. loss cost = 28.621228, reg cost = 0.000004, ppl2 = 11.19 (smooth 10.80)
evaluating val performance in batches of 100
evaluated 5000 sentences and got perplexity = 17.785250
validation perplexity = 17.785250

karpathy · 2015-01-10T10:19:17Z

@StevenLOL Nice! Looking at the Model Zoo,
http://cs.stanford.edu/people/karpathy/neuraltalk/

my LSTM model achieves perplexity of about 15.7 (which is slightly better). I ran it for longer and cross-validated it on our cluster, though.

pannous · 2015-01-10T13:08:50Z

Thanks I will try again with reduced learning rate

On Jan 10, 2015, at 10:59 AM, Steven [email protected] wrote:

Here is my result on the default setting:

python driver.py
parsed parameters:
{
"grad_clip": 5,
"rnn_relu_encoders": 0,
"dataset": "flickr8k",
"image_encoding_size": 256,
"eval_max_images": -1,
"drop_prob_decoder": 0.5,
"word_encoding_size": 256,
"max_epochs": 50,
"eval_batch_size": 100,
"fappend": "baseline",
"generator": "lstm",
"write_checkpoint_ppl_threshold": -1,
"decay_rate": 0.999,
"tanhC_version": 0,
"hidden_size": 256,
"momentum": 0.0,
"worker_status_output_directory": "status/",
"learning_rate": 0.001,
"checkpoint_output_directory": "cv/",
"do_grad_check": 0,
"word_count_threshold": 5,
"batch_size": 100,
"regc": 1e-08,
"smooth_eps": 1e-08,
"solver": "rmsprop",
"eval_period": 1.0,
"drop_prob_encoder": 0.5
}
Initializing data provider for dataset flickr8k...
BasicDataProvider: reading data/flickr8k/dataset.json
BasicDataProvider: reading data/flickr8k/vgg_feats.mat
preprocessing word counts and creating vocab based on word count threshold 5

253/15000 batch done in 3.242s. at epoch 0.84. loss cost = 39.264201, reg cost = 0.000001, ppl2 = 29.60 (smooth 47.89)
254/15000 batch done in 3.133s. at epoch 0.85. loss cost = 39.633654, reg cost = 0.000001, ppl2 = 33.57 (smooth 47.74)
255/15000 batch done in 3.169s. at epoch 0.85. loss cost = 38.571550, reg cost = 0.000001, ppl2 = 29.56 (smooth 47.56)

.
.
.
.
.
one day later...
.

14999/15000 batch done in 3.492s. at epoch 50.00. loss cost = 28.621228, reg cost = 0.000004, ppl2 = 11.19 (smooth 10.80)
evaluating val performance in batches of 100
evaluated 5000 sentences and got perplexity = 17.785250
validation perplexity = 17.785250

—
Reply to this email directly or view it on GitHub.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Aborting, cost seems to be exploding. #4

Aborting, cost seems to be exploding. #4

pannous commented Jan 8, 2015

karpathy commented Jan 8, 2015

StevenLOL commented Jan 10, 2015

karpathy commented Jan 10, 2015

pannous commented Jan 10, 2015

Aborting, cost seems to be exploding. #4

Aborting, cost seems to be exploding. #4

Comments

pannous commented Jan 8, 2015

karpathy commented Jan 8, 2015

StevenLOL commented Jan 10, 2015

karpathy commented Jan 10, 2015

pannous commented Jan 10, 2015