Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model tuning duplicates generated in decoder #8

Open
mingchen62 opened this issue Mar 13, 2018 · 4 comments
Open

Model tuning duplicates generated in decoder #8

mingchen62 opened this issue Mar 13, 2018 · 4 comments

Comments

@mingchen62
Copy link

After training, I started to evaluate and I found the prediction interesting.
The trained model did good prediction on some more complicate Latex such as fraction or sqrt, it failed on some simpler formula.
For example,
ground truth is "y=x^+2x +1" but the prediction is "y=x^2+2x +2x + 1".
ground truth is "270" but the prediction is "2700".
The decoder duplicates last symbol(s).
Any hint on how to tune the model to alleviate the issue?

My training results looks reasonable:
Epoch: 11 Step 43142 - Val Accuracy = 0.923066 Perp = 1.137150
Epoch: 12 Step 47064 - Val Accuracy = nan Perp = 1.138024

@da03
Copy link
Collaborator

da03 commented Mar 13, 2018

Hmm which dataset are you using? I haven't observed that repetition problem in im2text before, but repetition has been a well known problem in other seq2seq problems like summarization, and people usually solve that with coverage penalty to avoid attending to the same source word too much.

@mingchen62
Copy link
Author

thanks. I am using a handwriting formula data set. I guess, the variety of distance between handwriting symbols contributes to the repetition problem. I also look at opennmt-py for coverage penalty.
i.e. OpenNMT/OpenNMT-py#340.
Will report if I have any luck in that try.

@mingchen62
Copy link
Author

Tried a few combination of length and coverage parameters. some worse and some minor improvement.
May need more hyper parameter exploration.
For example, https://arxiv.org/pdf/1703.03906.pdf

@zhangw-memo
Copy link

after training,how about your bleu value? I didn't do anything,accuracy increased by 3%.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants