Question on why word embedding limited to training/test set vocabulary #3

LydiaXiaohongLi · 2019-01-06T13:25:26Z

Hi John
Thanks for your sharing. I have a question on word embedding. Correct me if I am wrong: noticed the word embedding created here only contains words in the training/test set. I would think a word embedding including all vocab in GloVE file will be better? For example, if in production, we encounter a new word than in training/test set, but it is part of the GloVE vocab, in this case, we can capture the meaning of the production words although we don't see it in training/test set. I think this will benefit sentiment classification problems with smaller training set?
Thanks!
Regards
Xiaohong

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question on why word embedding limited to training/test set vocabulary #3

Question on why word embedding limited to training/test set vocabulary #3

LydiaXiaohongLi commented Jan 6, 2019

Question on why word embedding limited to training/test set vocabulary #3

Question on why word embedding limited to training/test set vocabulary #3

Comments

LydiaXiaohongLi commented Jan 6, 2019