Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

when decoding,something wrong. #17

Open
ljsun opened this issue Oct 9, 2018 · 15 comments
Open

when decoding,something wrong. #17

ljsun opened this issue Oct 9, 2018 · 15 comments

Comments

@ljsun
Copy link

ljsun commented Oct 9, 2018

hello, when I decode using eval model, something wrong,
could you help me?
the main information is:
Traceback (most recent call last):
File "run_summarization.py", line 845, in
tf.app.run()
File "/home/ices/anaconda3/lib/python3.6/site-packages/tensorflow/python/platform/app.py", line 126, in run
_sys.exit(main(argv))
File "run_summarization.py", line 841, in main
seq2seq.main(unused_argv)
File "run_summarization.py", line 810, in main
decoder.decode() # decode indefinitely (unless single_pass=True, in which case deocde the dataset exactly once)
File "/home/ices/zhangbowen/RLSeq2Seq/src/decode.py", line 115, in decode
best_hyp = beam_search.run_beam_search(self._sess, self._model, self._vocab, batch)
File "/home/ices/zhangbowen/RLSeq2Seq/src/beam_search.py", line 144, in run_beam_search
prev_encoder_es = encoder_es if FLAGS.use_temporal_attention else tf.stack([], axis=0))
File "/home/ices/zhangbowen/RLSeq2Seq/src/model.py", line 855, in decode_onestep
results = sess.run(to_return, feed_dict=feed) # run the decoder step
File "/home/ices/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 900, in run
run_metadata_ptr)
File "/home/ices/anaconda3/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1111, in _run
str(subfeed_t.get_shape())))
ValueError: Cannot feed value of shape (4, 256) for Tensor 'prev_decoder_outputs:0', which has shape '(?, 4, 256)'

@yaserkl
Copy link
Owner

yaserkl commented Oct 9, 2018

Could you plz share your command?

@ljsun
Copy link
Author

ljsun commented Oct 10, 2018

my command is blow:
python run_summarization.py --mode=decode --data_path=../data_no_extract/finished_files/chunked/submit_* --vocab_path=../data_no_extract/finished_files/vocab --log_root=../log --exp_name=intradecoder-temporalattention-withpretraining --rl_training=False --intradecoder=True --use_temporal_attention=True --single_pass=1 --beam_size=4 --decode_from=eval
I just want use pretraining model to decode, so the rl_training=Flase.
But now I solve this problem by modify some code in beam_search.py. The modified code is part of following.
# decoder_outputs = [[h.decoder_output for h in hyps]] decoder_outputs = np.array([h.decoder_output for h in hyps]).swapaxes(0, 1)
# encoder_es = [[h.encoder_mask for h in hyps]] encoder_es = np.array([h.encoder_mask for h in hyps]).swapaxes(0, 1)
Now I have trained the model by RL. But when I use the trained RL model to decode, all decoded result is same.
This must be wrong, but I don't now how to fix it.
Could you plz help me?

@yaserkl
Copy link
Owner

yaserkl commented Oct 12, 2018

Yes, if you get the same results during decoding, your trained model still hasn't converged. You should check convergence of the model by looking at the loss you receive from evaluation dataset. If after some point this loss doesn't decrease, your model has converged and is ready to decode.
BTW, I've updated the code to fix this issue.

@ljsun
Copy link
Author

ljsun commented Oct 19, 2018

Hello, I encountered some confusion when training the model.
First, I use the no-RL to train the model, and final the pgen_loss converge to 3.0.
And the I use the RL to train the model, and final the pgen_loss is 36, the rl_loss is 0. I think it's wrong
Could you please tell me your training steps、parameter and how many times does your model coverage?

@yaserkl
Copy link
Owner

yaserkl commented Oct 21, 2018

For CNN/DM dataset with 287226 train size, I'd suggest the following setup:
Train this model using batch size 32 for 15 epochs (max_iter=134637)
then activate RL training for another 15 epochs (max_iter=269274) and eta=1/269274=3.71368E-06 and then coverage for 3 epochs. Then you might be able to see some results. BTW, make sure to activate scheduled sampling during RL training with scheduled_probability equal to eta.

@ljsun
Copy link
Author

ljsun commented Oct 22, 2018

Thank you for your reply, but if I just only want to use RL training, Should i set eta=1, scheduled_sampling=True and sampling_probability=1?

@yaserkl
Copy link
Owner

yaserkl commented Oct 22, 2018

Yes, but this only works if you have a very well-trained model based on MLE loss and usually what researchers do is not to use eta=1 but use a value that is close to 1, i.e. eta=0.9984. Check out Paulus et. all paper for more information.

@BeckyWang
Copy link

For CNN/DM dataset with 287226 train size, I'd suggest the following setup:
Train this model using batch size 32 for 15 epochs (max_iter=134637)
then activate RL training for another 15 epochs (max_iter=269274) and eta=1/269274=3.71368E-06 and then coverage for 3 epochs. Then you might be able to see some results. BTW, make sure to activate scheduled sampling during RL training with scheduled_probability equal to eta.

In the third step( coverage for 3 epochs), should I set rl_training=True and eta=1/269274=3.71368E-06at the same time?

@yaserkl
Copy link
Owner

yaserkl commented Jan 19, 2019

Yes, you still need the RL training to be true since you are using RL for training and make sure to set eta to the right value for the coverage.

@xiangriconglin
Copy link

xiangriconglin commented Jun 4, 2019

For CNN/DM dataset with 287226 train size, I'd suggest the following setup:
Train this model using batch size 32 for 15 epochs (max_iter=134637)
then activate RL training for another 15 epochs (max_iter=269274) and eta=1/269274=3.71368E-06 and then coverage for 3 epochs. Then you might be able to see some results. BTW, make sure to activate scheduled sampling during RL training with scheduled_probability equal to eta.

I use the parameters in "Get to the point: summarization with pointer-generator networks" to train the model for 600000 iters, and final the pgen_loss converge to 3.7~4.3.
The command:
python src/run_summarization.py --mode=train --data_path=./data/cnn_dm/chunked/train_* --vocab_path=./data/cnn_dm/vocab --log_root=./log --exp_name=pointer-generator --batch_size=16 --max_iter=600000
Part of log:

INFO:tensorflow:seconds for training step 599991: 2.264786958694458
INFO:tensorflow:pgen_loss: 4.027677536010742
INFO:tensorflow:-------------------------------------------
INFO:tensorflow:Saving checkpoint to path ./log/pointer-generator/train/model.ckpt
INFO:tensorflow:seconds for training step 599992: 2.1617939472198486
INFO:tensorflow:pgen_loss: 3.8613383769989014
INFO:tensorflow:-------------------------------------------
INFO:tensorflow:seconds for training step 599993: 2.469341278076172
INFO:tensorflow:pgen_loss: 4.1178693771362305
INFO:tensorflow:-------------------------------------------
INFO:tensorflow:seconds for training step 599994: 2.2564828395843506
INFO:tensorflow:pgen_loss: 4.207524299621582
INFO:tensorflow:-------------------------------------------
INFO:tensorflow:seconds for training step 599995: 2.2818007469177246
INFO:tensorflow:pgen_loss: 4.516533851623535
INFO:tensorflow:-------------------------------------------
INFO:tensorflow:seconds for training step 599996: 2.2542030811309814
INFO:tensorflow:pgen_loss: 4.4078826904296875
INFO:tensorflow:-------------------------------------------
INFO:tensorflow:seconds for training step 599997: 2.278555154800415
INFO:tensorflow:pgen_loss: 3.9652469158172607
INFO:tensorflow:-------------------------------------------
INFO:tensorflow:seconds for training step 599998: 2.2138431072235107
INFO:tensorflow:pgen_loss: 4.073095321655273
INFO:tensorflow:-------------------------------------------
INFO:tensorflow:seconds for training step 599999: 2.013199806213379
INFO:tensorflow:pgen_loss: 3.7511510848999023
INFO:tensorflow:-------------------------------------------
INFO:tensorflow:seconds for training step 600000: 2.280336856842041
INFO:tensorflow:pgen_loss: 4.327310085296631
INFO:tensorflow:-------------------------------------------

However all decoded result is same.

What's wrong with me? How can i fix this problem?
Respect for reply.

@xiangriconglin
Copy link

my command is blow:
python run_summarization.py --mode=decode --data_path=../data_no_extract/finished_files/chunked/submit_* --vocab_path=../data_no_extract/finished_files/vocab --log_root=../log --exp_name=intradecoder-temporalattention-withpretraining --rl_training=False --intradecoder=True --use_temporal_attention=True --single_pass=1 --beam_size=4 --decode_from=eval
I just want use pretraining model to decode, so the rl_training=Flase.
But now I solve this problem by modify some code in beam_search.py. The modified code is part of following.
# decoder_outputs = [[h.decoder_output for h in hyps]] decoder_outputs = np.array([h.decoder_output for h in hyps]).swapaxes(0, 1)
# encoder_es = [[h.encoder_mask for h in hyps]] encoder_es = np.array([h.encoder_mask for h in hyps]).swapaxes(0, 1)
Now I have trained the model by RL. But when I use the trained RL model to decode, all decoded result is same.
This must be wrong, but I don't now how to fix it.
Could you plz help me?

Excuse me, have you solved this problem?
Now i meet the same problem that all decoded result is same. My command is:
python src/run_summarization.py --mode=train --data_path=./data/cnn_dm/chunked/train_* --vocab_path=./data/cnn_dm/vocab --log_root=./log --exp_name=pointer-generator --batch_size=16 --max_iter=600000
Expect for your reply, thank you!

@yaserkl
Copy link
Owner

yaserkl commented Jun 18, 2019

Could you please share your decoding command and some of the outputs?

@xiangriconglin
Copy link

Thank you very much. The issue has been solve. Thank you for your reply.

@DengYangyong
Copy link

Thank you very much. The issue has been solve. Thank you for your reply.

I meet the sample problem that all the decoded results of different examples are the same no matter I set rl_training=Flase or True, and I have trained the model for about 300000 steps and the loss has stopped failling.

How did you solve the issue? This is very important to me and I am looking forward to your reply.

@DengYangyong
Copy link

I meet the sample problem as you when I train the model in chinese dataset.
In addtion, the loss is too high which was 11 at the beggining and decreased to about 6 after 30K steps.
There may be same problem with the code as when I use the pytorch version ( https://github.com/rohithreddy024/Text-Summarizer-Pytorch) to train the model , the loss ranged from 6 to 0.8 and the results were good.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants