Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question on why word embedding limited to training/test set vocabulary #3

Open
LydiaXiaohongLi opened this issue Jan 6, 2019 · 0 comments

Comments

@LydiaXiaohongLi
Copy link

Hi John
Thanks for your sharing. I have a question on word embedding. Correct me if I am wrong: noticed the word embedding created here only contains words in the training/test set. I would think a word embedding including all vocab in GloVE file will be better? For example, if in production, we encounter a new word than in training/test set, but it is part of the GloVE vocab, in this case, we can capture the meaning of the production words although we don't see it in training/test set. I think this will benefit sentiment classification problems with smaller training set?
Thanks!
Regards
Xiaohong

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant