Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

如何读取sgns.financial.bigram-char #149

Open
fredericky123 opened this issue Jan 18, 2022 · 2 comments
Open

如何读取sgns.financial.bigram-char #149

fredericky123 opened this issue Jan 18, 2022 · 2 comments

Comments

@fredericky123
Copy link

fredericky123 commented Jan 18, 2022

我下载下来后,使用如下语句指定训练好的模型,py运行却没有任何反应
model = gensim.models.KeyedVectors.load_word2vec_format('/text/sgns.financial.bigram-char')
而换为另一个混合类的模型,就能正常运行
model = gensim.models.KeyedVectors.load_word2vec_format('/text/merge_sgns_bigram_char300.txt')
这是为什么呢?是不是第一个的格式不对?还是需要另外的语句读取model?
谢谢呀!

@stay-leave
Copy link

我用的这个
def weight(self,vocab_to_index):
#将词映射为预训练词向量
size_vocab = len(vocab_to_index)#字典大小
embeddings = np.zeros((size_vocab, 300))#初始化数组 为零,300维
found=0#匹配到的词向量个数
with open(r'..\datasets\sgns.weibo.char','r',encoding='utf-8') as f:#读取预训练词向量文件
for line_idx, line in enumerate(f):#遍历索引和值,值格式为:词,词向量
line = line.strip().split()#值
if len(line) != 300 + 1:#保证每个词向量为300维
continue
word = line[0]#词
embedding = line[1:]#词向量
if word in vocab_to_index:
found=found+1#加一
word_idx = vocab_to_index[word]#找到对应索引
embeddings[word_idx] = embedding#该索引位置对应词向量
print('获取到的词向量:'+str(found)+'所有的词:'+str(size_vocab)+'匹配率:{:.2f}%'.format(found/size_vocab*100))
# 保存提取到的词向量数组
np.savez_compressed(r'..\datasets\vec.npz', embeddings=embeddings)
#return embeddings

@HunterHeidy
Copy link

HunterHeidy commented Jun 28, 2022 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants