The sentences generated by the trained model is incomplete #9

chenxinpeng · 2016-12-02T09:31:55Z

First, thanks for your hard work on the code, it's very generous of you to share the code. :)

But, when I use the model which have been trained to generate sentences. I always get the sentences like this:

a man is talking to a .
a man is riding a .
a man is playing with .

The sentences are mostly incomplete, like the sentences are truncated.

Strangely, then I used the coco-caption code to evaluate the sentences which I generated. The METEOR value is 27.7%, is very close to the paper.

So, I want to know, how to solve this problem? Can you give some some advice? I think the problem may caused by the code.

Thank you for your assistance.

lcmaster-hx · 2016-12-07T13:06:13Z

hello, how long does it take to train this model? Am I really need a GPU to do that? I'm a beginner, thank u.

chenxinpeng · 2016-12-13T12:16:48Z

@Aoki1994 Hi, if you follow the parameters in the original code, the training time will take about 12h. For myself, I change the code, and the hidden units in the LSTM, I have set to 1000, I will take about 24h. Absolutely, I strongly suggest you should have a GPU. BTW, GTX 1080 is enough.

lcmaster-hx · 2016-12-18T12:37:55Z

@chenxinpeng thank u very much！I'll have a try.

agethen · 2017-02-06T05:04:43Z

For anyone still interested in the original problem, I believe it is caused by the following:
In model.py in line 267 the captions for training are loaded. Unfortunately, they still have '.' and ',' (unlike the preprocessed dictionary).
Then, in line 268 the last word is dropped for some reason (maybe because of the final '.' ?). That is, during training, the final word of the sentence is never learnt.

To fix this, modify ln 267 and 268 as following:

current_captions = current_batch[ 'Description' ].values

# Remove '.' and ',' from caption
for idx, cc in enumerate( current_captions ):
          current_captions[idx] = cc.replace('.', '').replace(',','')

# Remove the [:-1] in this line!
current_captions_ind  = map( lambda cap : [ wordtoix[word] for word in cap.lower().split(' ') if word in wordtoix], current_captions )

Disclaimer: Have not trained it yet, but the caption and mask now look correct :) Also, make sure your threshold for preProBuildWordVocab is not too low if you are missing words...

Edit: Trained for 200 epochs, can confirm that this fixes it!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The sentences generated by the trained model is incomplete #9

The sentences generated by the trained model is incomplete #9

chenxinpeng commented Dec 2, 2016

lcmaster-hx commented Dec 7, 2016

chenxinpeng commented Dec 13, 2016

lcmaster-hx commented Dec 18, 2016

agethen commented Feb 6, 2017 •

edited

Loading

The sentences generated by the trained model is incomplete #9

The sentences generated by the trained model is incomplete #9

Comments

chenxinpeng commented Dec 2, 2016

lcmaster-hx commented Dec 7, 2016

chenxinpeng commented Dec 13, 2016

lcmaster-hx commented Dec 18, 2016

agethen commented Feb 6, 2017 • edited Loading

agethen commented Feb 6, 2017 •

edited

Loading