You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,Kim.Thank you for sharing your code.But I have a question about your model. In your implementation (shown below), the embedding weight contain all words from train set and test set. But I think it should contain words only in train set because in a real scene you can't know test data in which maybe there are has some OOV (some words/vocabulary out of train set). if in static mode(CNN-static), this is no problem. But in a non-static mode (CNN-non-static) how can you solve this OOV problem(how to update OOV words ( not present in model vocabulary)' embedding parameters). In brief, for words which present in word2vec model but not in origin model vocabulary, how can you solve it. Be sorry for my English is poor and expression may be not clear. Thank you.
def get_W(word_vecs, k=300):
"""
Get word matrix. W[i] is the vector for word indexed by i
"""
vocab_size = len(word_vecs)
word_idx_map = dict()
W = np.zeros(shape=(vocab_size+1, k), dtype='float32')
W[0] = np.zeros(k, dtype='float32')
i = 1
for word in word_vecs:
W[i] = word_vecs[word]
word_idx_map[word] = i
i += 1
return W, word_idx_map
The text was updated successfully, but these errors were encountered:
Hi,Kim.Thank you for sharing your code.But I have a question about your model. In your implementation (shown below), the embedding weight contain all words from train set and test set. But I think it should contain words only in train set because in a real scene you can't know test data in which maybe there are has some OOV (some words/vocabulary out of train set). if in static mode(CNN-static), this is no problem. But in a non-static mode (CNN-non-static) how can you solve this OOV problem(how to update OOV words ( not present in model vocabulary)' embedding parameters). In brief, for words which present in word2vec model but not in origin model vocabulary, how can you solve it. Be sorry for my English is poor and expression may be not clear. Thank you.
The text was updated successfully, but these errors were encountered: