Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nlp_word2vec component load word2vec from file. #12

Open
zqhZY opened this issue Nov 21, 2017 · 6 comments
Open

nlp_word2vec component load word2vec from file. #12

zqhZY opened this issue Nov 21, 2017 · 6 comments

Comments

@zqhZY
Copy link
Contributor

zqhZY commented Nov 21, 2017

用不同工具(如fasttext / gensim)训练word2vec保存的词向量结果格式不同, “nlp_word2vec”模块是否需要统一一个word2vec文件的存储方式?

@crownpku
Copy link
Member

我觉得应该是加一个参数, embedding_type="gensim"/"fasttext",因为用户有可能有不同的两种预训练词向量。
当前版本我们先实现word2vec的吧!

@zqhZY
Copy link
Contributor Author

zqhZY commented Nov 21, 2017

@crownpku
Copy link
Member

@zqhZY 所以对于已有词向量的存储和使用,只需要gensim一个包就够了对吧。用户要自己训练词向量的话,还是要用word2vec或者fasttext这些工具?

@zqhZY
Copy link
Contributor Author

zqhZY commented Nov 22, 2017

只需要gensim一个包就够, 可以适应大多数的工具生成的word2vec,已经提了一个request,辛苦review一下

@befeng
Copy link
Collaborator

befeng commented Nov 22, 2017

这个地方我还有一个问题,如果我们用的包太杂,其结果我们需要维护很多依赖库,这个怎么处理。

@crownpku
Copy link
Member

第一个是要做lazy import,即pipeline里某个模块某个函数用到了再去import相应的包。
另外对于常用包要及时写到requirements.txt里面
最后docker环境搭建的时候也会把依赖包一并打包进去的。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants