some Issues when I train the model from scratch and test with the provided code #3

willanxywc · 2018-04-18T08:48:14Z

Hi, some issues during I train the model from scratch :

I run with the latest gensim but got a model that's incompatiable with your provided gensim here. When I run the test code, the folloing error comes:

ValueError: You must specify either total_examples or total_words, for proper alpha and progress calculations. The usual value is total_examples=model.corpus_count.

So which version of gensim do you use?

Then I use the provided gensim to train the model from scratch, another error comes:

File "/home/disk2/jysun/gensim_vec/gensim/models/word2vec.py", line 572, in build_vocab report_values, pre_exist_words = self.scale_vocab(keep_raw_vocab=keep_raw_vocab, trim_rule=trim_rule, update=update) # trim by min_count & precalculate downsampling File "/home/disk2/jysun/gensim_vec/gensim/models/word2vec.py", line 731, in scale_vocab return report_values, pre_exist_words UnboundLocalError: local variable 'pre_exist_words' referenced before assignment
What should I do with these errors?

The text was updated successfully, but these errors were encountered:

minimalparts · 2018-04-20T08:35:32Z

Hm. So we submitted to EMNLP in April 2017, and used the early 2017 code, which was only in version 0.13 at the time. I'm afraid the gensim people then released several new versions very quickly. It was bad luck.

We're working on having a new version work with gensim 3.x, but until then I'm afraid there is not much I can suggest, short of using the older gensim or the pre-trained model. Sorry about that. I'll add a note to that effect on the README.

willanxywc · 2018-04-20T09:37:18Z

Thanks ~Then I may try to train with gensim 0.13. Could I bother to ask which exact version of gensim?
since 0.13 has several versions from 0.13.0 to 0.13.4.

minimalparts · 2018-04-20T09:39:38Z

I hear from others that any 0.13.x will work. I believe we were using 0.13.3.

un-lock-me · 2018-07-29T22:03:29Z

I got this error AttributeError: 'Model' object has no attribute 'id2word'
I was supposed it will be independent on the way we create the model.
Do you have any idea of this?

Thanks,

minimalparts · 2018-08-04T07:15:09Z

Sorry for the delayed reply... When does the error occur? This sounds like a gensim problem... Are you using the 0.13.3 version?

ghost · 2019-02-22T02:54:10Z

@willanxywc

I think, you need to specify "total_examples" and "epochs" on the current version of gensim.

model.train([sentence], total_examples=model.corpus_count, epochs=model.iter)

Similar issue: linanqiu/word2vec-sentiments#16

akb89 · 2019-02-22T07:24:58Z

You can also use the v2.0 release branch. We significantly refactored the code and it now works with gensim v3.4.x.

Fixed bug with wikidump extraction

willanxywc changed the title ~~Issues during training the model from scratch and test with the provided code~~ some Issues when I train the model from scratch and test with the provided code Apr 18, 2018

akb89 closed this as completed Feb 26, 2019

akb89 added a commit that referenced this issue Jul 29, 2019

Merge pull request #3 from akb89/develop

1f5b446

Fixed bug with wikidump extraction

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

some Issues when I train the model from scratch and test with the provided code #3

some Issues when I train the model from scratch and test with the provided code #3

willanxywc commented Apr 18, 2018

minimalparts commented Apr 20, 2018

willanxywc commented Apr 20, 2018

minimalparts commented Apr 20, 2018

un-lock-me commented Jul 29, 2018

minimalparts commented Aug 4, 2018

ghost commented Feb 22, 2019

akb89 commented Feb 22, 2019

some Issues when I train the model from scratch and test with the provided code #3

some Issues when I train the model from scratch and test with the provided code #3

Comments

willanxywc commented Apr 18, 2018

minimalparts commented Apr 20, 2018

willanxywc commented Apr 20, 2018

minimalparts commented Apr 20, 2018

un-lock-me commented Jul 29, 2018

minimalparts commented Aug 4, 2018

ghost commented Feb 22, 2019

akb89 commented Feb 22, 2019