when decoding，something wrong. #17

ljsun · 2018-10-09T03:40:27Z

hello, when I decode using eval model, something wrong,
could you help me?
the main information is:
Traceback (most recent call last):
File "run_summarization.py", line 845, in
tf.app.run()
File "/home/ices/anaconda3/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 126, in run
_sys.exit(main(argv))
File "run_summarization.py", line 841, in main
seq2seq.main(unused_argv)
File "run_summarization.py", line 810, in main
decoder.decode() # decode indefinitely (unless single_pass=True, in which case deocde the dataset exactly once)
File "/home/ices/zhangbowen/RLSeq2Seq/src/decode.py", line 115, in decode
best_hyp = beam_search.run_beam_search(self._sess, self._model, self._vocab, batch)
File "/home/ices/zhangbowen/RLSeq2Seq/src/beam_search.py", line 144, in run_beam_search
prev_encoder_es = encoder_es if FLAGS.use_temporal_attention else tf.stack([], axis=0))
File "/home/ices/zhangbowen/RLSeq2Seq/src/model.py", line 855, in decode_onestep
results = sess.run(to_return, feed_dict=feed) # run the decoder step
File "/home/ices/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 900, in run
run_metadata_ptr)
File "/home/ices/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1111, in _run
str(subfeed_t.get_shape())))
ValueError: Cannot feed value of shape (4, 256) for Tensor 'prev_decoder_outputs:0', which has shape '(?, 4, 256)'

yaserkl · 2018-10-09T16:08:43Z

Could you plz share your command?

ljsun · 2018-10-10T11:37:36Z

my command is blow:
python run_summarization.py --mode=decode --data_path=../data_no_extract/finished_files/chunked/submit_* --vocab_path=../data_no_extract/finished_files/vocab --log_root=../log --exp_name=intradecoder-temporalattention-withpretraining --rl_training=False --intradecoder=True --use_temporal_attention=True --single_pass=1 --beam_size=4 --decode_from=eval
I just want use pretraining model to decode, so the rl_training=Flase.
But now I solve this problem by modify some code in beam_search.py. The modified code is part of following.
# decoder_outputs = [[h.decoder_output for h in hyps]] decoder_outputs = np.array([h.decoder_output for h in hyps]).swapaxes(0, 1)
# encoder_es = [[h.encoder_mask for h in hyps]] encoder_es = np.array([h.encoder_mask for h in hyps]).swapaxes(0, 1)
Now I have trained the model by RL. But when I use the trained RL model to decode, all decoded result is same.
This must be wrong, but I don't now how to fix it.
Could you plz help me?

yaserkl · 2018-10-12T18:24:55Z

Yes, if you get the same results during decoding, your trained model still hasn't converged. You should check convergence of the model by looking at the loss you receive from evaluation dataset. If after some point this loss doesn't decrease, your model has converged and is ready to decode.
BTW, I've updated the code to fix this issue.

ljsun · 2018-10-19T11:25:49Z

Hello, I encountered some confusion when training the model.
First, I use the no-RL to train the model, and final the pgen_loss converge to 3.0.
And the I use the RL to train the model, and final the pgen_loss is 36, the rl_loss is 0. I think it's wrong
Could you please tell me your training steps、parameter and how many times does your model coverage?

yaserkl · 2018-10-21T22:30:08Z

For CNN/DM dataset with 287226 train size, I'd suggest the following setup:
Train this model using batch size 32 for 15 epochs (max_iter=134637)
then activate RL training for another 15 epochs (max_iter=269274) and eta=1/269274=3.71368E-06 and then coverage for 3 epochs. Then you might be able to see some results. BTW, make sure to activate scheduled sampling during RL training with scheduled_probability equal to eta.

ljsun · 2018-10-22T06:14:50Z

Thank you for your reply, but if I just only want to use RL training, Should i set eta=1, scheduled_sampling＝True and sampling_probability=1?

yaserkl · 2018-10-22T18:49:23Z

Yes, but this only works if you have a very well-trained model based on MLE loss and usually what researchers do is not to use eta=1 but use a value that is close to 1, i.e. eta=0.9984. Check out Paulus et. all paper for more information.

BeckyWang · 2018-12-26T08:42:30Z

For CNN/DM dataset with 287226 train size, I'd suggest the following setup:
Train this model using batch size 32 for 15 epochs (max_iter=134637)
then activate RL training for another 15 epochs (max_iter=269274) and eta=1/269274=3.71368E-06 and then coverage for 3 epochs. Then you might be able to see some results. BTW, make sure to activate scheduled sampling during RL training with scheduled_probability equal to eta.

In the third step( coverage for 3 epochs), should I set rl_training=True and eta=1/269274=3.71368E-06at the same time？

yaserkl · 2019-01-19T05:25:36Z

Yes, you still need the RL training to be true since you are using RL for training and make sure to set eta to the right value for the coverage.

xiangriconglin · 2019-06-04T08:25:17Z

For CNN/DM dataset with 287226 train size, I'd suggest the following setup:
Train this model using batch size 32 for 15 epochs (max_iter=134637)
then activate RL training for another 15 epochs (max_iter=269274) and eta=1/269274=3.71368E-06 and then coverage for 3 epochs. Then you might be able to see some results. BTW, make sure to activate scheduled sampling during RL training with scheduled_probability equal to eta.

I use the parameters in "Get to the point: summarization with pointer-generator networks" to train the model for 600000 iters, and final the pgen_loss converge to 3.7~4.3.
The command:
python src/run_summarization.py --mode=train --data_path=./data/cnn_dm/chunked/train_* --vocab_path=./data/cnn_dm/vocab --log_root=./log --exp_name=pointer-generator --batch_size=16 --max_iter=600000
Part of log:

INFO:tensorflow:seconds for training step 599991: 2.264786958694458
INFO:tensorflow:pgen_loss: 4.027677536010742
INFO:tensorflow:-------------------------------------------
INFO:tensorflow:Saving checkpoint to path ./log/pointer-generator/train/model.ckpt
INFO:tensorflow:seconds for training step 599992: 2.1617939472198486
INFO:tensorflow:pgen_loss: 3.8613383769989014
INFO:tensorflow:-------------------------------------------
INFO:tensorflow:seconds for training step 599993: 2.469341278076172
INFO:tensorflow:pgen_loss: 4.1178693771362305
INFO:tensorflow:-------------------------------------------
INFO:tensorflow:seconds for training step 599994: 2.2564828395843506
INFO:tensorflow:pgen_loss: 4.207524299621582
INFO:tensorflow:-------------------------------------------
INFO:tensorflow:seconds for training step 599995: 2.2818007469177246
INFO:tensorflow:pgen_loss: 4.516533851623535
INFO:tensorflow:-------------------------------------------
INFO:tensorflow:seconds for training step 599996: 2.2542030811309814
INFO:tensorflow:pgen_loss: 4.4078826904296875
INFO:tensorflow:-------------------------------------------
INFO:tensorflow:seconds for training step 599997: 2.278555154800415
INFO:tensorflow:pgen_loss: 3.9652469158172607
INFO:tensorflow:-------------------------------------------
INFO:tensorflow:seconds for training step 599998: 2.2138431072235107
INFO:tensorflow:pgen_loss: 4.073095321655273
INFO:tensorflow:-------------------------------------------
INFO:tensorflow:seconds for training step 599999: 2.013199806213379
INFO:tensorflow:pgen_loss: 3.7511510848999023
INFO:tensorflow:-------------------------------------------
INFO:tensorflow:seconds for training step 600000: 2.280336856842041
INFO:tensorflow:pgen_loss: 4.327310085296631
INFO:tensorflow:-------------------------------------------

However all decoded result is same.

What's wrong with me? How can i fix this problem?
Respect for reply.

xiangriconglin · 2019-06-04T08:35:41Z

my command is blow:
python run_summarization.py --mode=decode --data_path=../data_no_extract/finished_files/chunked/submit_* --vocab_path=../data_no_extract/finished_files/vocab --log_root=../log --exp_name=intradecoder-temporalattention-withpretraining --rl_training=False --intradecoder=True --use_temporal_attention=True --single_pass=1 --beam_size=4 --decode_from=eval
I just want use pretraining model to decode, so the rl_training=Flase.
But now I solve this problem by modify some code in beam_search.py. The modified code is part of following.
# decoder_outputs = [[h.decoder_output for h in hyps]] decoder_outputs = np.array([h.decoder_output for h in hyps]).swapaxes(0, 1)
# encoder_es = [[h.encoder_mask for h in hyps]] encoder_es = np.array([h.encoder_mask for h in hyps]).swapaxes(0, 1)
Now I have trained the model by RL. But when I use the trained RL model to decode, all decoded result is same.
This must be wrong, but I don't now how to fix it.
Could you plz help me?

Excuse me, have you solved this problem?
Now i meet the same problem that all decoded result is same. My command is:
python src/run_summarization.py --mode=train --data_path=./data/cnn_dm/chunked/train_* --vocab_path=./data/cnn_dm/vocab --log_root=./log --exp_name=pointer-generator --batch_size=16 --max_iter=600000
Expect for your reply, thank you!

yaserkl · 2019-06-18T20:33:34Z

Could you please share your decoding command and some of the outputs?

xiangriconglin · 2019-06-19T01:35:01Z

Thank you very much. The issue has been solve. Thank you for your reply.

DengYangyong · 2020-12-28T02:30:39Z

Thank you very much. The issue has been solve. Thank you for your reply.

I meet the sample problem that all the decoded results of different examples are the same no matter I set rl_training=Flase or True, and I have trained the model for about 300000 steps and the loss has stopped failling.

How did you solve the issue? This is very important to me and I am looking forward to your reply.

DengYangyong · 2020-12-28T03:04:44Z

I meet the sample problem as you when I train the model in chinese dataset.
In addtion, the loss is too high which was 11 at the beggining and decreased to about 6 after 30K steps.
There may be same problem with the code as when I use the pytorch version ( https://github.com/rohithreddy024/Text-Summarizer-Pytorch) to train the model , the loss ranged from 6 to 0.8 and the results were good.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

when decoding，something wrong. #17

when decoding，something wrong. #17

ljsun commented Oct 9, 2018

yaserkl commented Oct 9, 2018

ljsun commented Oct 10, 2018

yaserkl commented Oct 12, 2018

ljsun commented Oct 19, 2018

yaserkl commented Oct 21, 2018

ljsun commented Oct 22, 2018

yaserkl commented Oct 22, 2018

BeckyWang commented Dec 26, 2018

yaserkl commented Jan 19, 2019

xiangriconglin commented Jun 4, 2019 •

edited

Loading

xiangriconglin commented Jun 4, 2019

yaserkl commented Jun 18, 2019

xiangriconglin commented Jun 19, 2019

DengYangyong commented Dec 28, 2020

DengYangyong commented Dec 28, 2020

when decoding，something wrong. #17

when decoding，something wrong. #17

Comments

ljsun commented Oct 9, 2018

yaserkl commented Oct 9, 2018

ljsun commented Oct 10, 2018

yaserkl commented Oct 12, 2018

ljsun commented Oct 19, 2018

yaserkl commented Oct 21, 2018

ljsun commented Oct 22, 2018

yaserkl commented Oct 22, 2018

BeckyWang commented Dec 26, 2018

yaserkl commented Jan 19, 2019

xiangriconglin commented Jun 4, 2019 • edited Loading

xiangriconglin commented Jun 4, 2019

yaserkl commented Jun 18, 2019

xiangriconglin commented Jun 19, 2019

DengYangyong commented Dec 28, 2020

DengYangyong commented Dec 28, 2020

xiangriconglin commented Jun 4, 2019 •

edited

Loading