Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

some Issues when I train the model from scratch and test with the provided code #3

Closed
willanxywc opened this issue Apr 18, 2018 · 7 comments

Comments

@willanxywc
Copy link

Hi, some issues during I train the model from scratch :

  1. I run with the latest gensim but got a model that's incompatiable with your provided gensim here. When I run the test code, the folloing error comes:

ValueError: You must specify either total_examples or total_words, for proper alpha and progress calculations. The usual value is total_examples=model.corpus_count.

So which version of gensim do you use?

  1. Then I use the provided gensim to train the model from scratch, another error comes:

File "/home/disk2/jysun/gensim_vec/gensim/models/word2vec.py", line 572, in build_vocab report_values, pre_exist_words = self.scale_vocab(keep_raw_vocab=keep_raw_vocab, trim_rule=trim_rule, update=update) # trim by min_count & precalculate downsampling File "/home/disk2/jysun/gensim_vec/gensim/models/word2vec.py", line 731, in scale_vocab return report_values, pre_exist_words UnboundLocalError: local variable 'pre_exist_words' referenced before assignment
What should I do with these errors?

@willanxywc willanxywc changed the title Issues during training the model from scratch and test with the provided code some Issues when I train the model from scratch and test with the provided code Apr 18, 2018
@minimalparts
Copy link
Owner

Hm. So we submitted to EMNLP in April 2017, and used the early 2017 code, which was only in version 0.13 at the time. I'm afraid the gensim people then released several new versions very quickly. It was bad luck.

We're working on having a new version work with gensim 3.x, but until then I'm afraid there is not much I can suggest, short of using the older gensim or the pre-trained model. Sorry about that. I'll add a note to that effect on the README.

@willanxywc
Copy link
Author

Thanks ~Then I may try to train with gensim 0.13. Could I bother to ask which exact version of gensim?
since 0.13 has several versions from 0.13.0 to 0.13.4.

@minimalparts
Copy link
Owner

I hear from others that any 0.13.x will work. I believe we were using 0.13.3.

@un-lock-me
Copy link

I got this error AttributeError: 'Model' object has no attribute 'id2word'
I was supposed it will be independent on the way we create the model.
Do you have any idea of this?

Thanks,

@minimalparts
Copy link
Owner

Sorry for the delayed reply... When does the error occur? This sounds like a gensim problem... Are you using the 0.13.3 version?

@ghost
Copy link

ghost commented Feb 22, 2019

@willanxywc

I think, you need to specify "total_examples" and "epochs" on the current version of gensim.

model.train([sentence], total_examples=model.corpus_count, epochs=model.iter)

Similar issue: linanqiu/word2vec-sentiments#16

@akb89
Copy link
Collaborator

akb89 commented Feb 22, 2019

You can also use the v2.0 release branch. We significantly refactored the code and it now works with gensim v3.4.x.

@akb89 akb89 closed this as completed Feb 26, 2019
akb89 added a commit that referenced this issue Jul 29, 2019
Fixed bug with wikidump extraction
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants