error with the training command #16

RenatoPerotti · 2017-07-20T13:32:26Z

The new word2vec requires total_examples to be specified in the train command, now it gives the error:

ValueError: You must specify either total_examples or total_words, for proper alpha and progress calculations. The usual value is total_examples=model.corpus_count.

so I changed it to the following:

model.train(sentences.sentences_perm,total_examples=model.corpus_count())

but it gives a new error:

TypeError: 'int' object is not callable

Does anyone have an idea what to do with this?

The text was updated successfully, but these errors were encountered:

linanqiu · 2017-07-21T02:22:55Z

try corpus_count instead of corpus_count()

I'll update this during the weekend

linanqiu · 2017-07-21T02:23:04Z

Let me know if it works!

RenatoPerotti · 2017-07-22T23:12:38Z

First I changed the line into:
model.train(sentences.sentences_perm, total_examples=model.corpus_count)
after that it proposes to ad epochs so I changed it into:
model.train(sentences.sentences_perm, total_examples=model.corpus_count, epochs=model.iter)
but now I get a more complex error message:

_Exception in thread Thread-13:
Traceback (most recent call last):
File "E:\Python3\lib\threading.py", line 916, in _bootstrap_inner
self.run()
File "E:\Python3\lib\threading.py", line 864, in run
self._target(*self._args, **self.kwargs)
File "E:\Python3\lib\site-packages\gensim\models\word2vec.py", line 854, in job_producer
for sent_idx, sentence in enumerate(sentences):
File "E:\Python3\lib\site-packages\gensim\utils.py", line 687, in iter
for document in self.corpus:
TypeError: 'method' object is not iterable

linanqiu · 2017-07-23T16:03:09Z

Try model.iter()

…

On Jul 22, 2017 7:12 PM, "RenatoPerotti" ***@***.***> wrote: I changed the line into: model.train(sentences.sentences_perm, total_examples=model.corpus_count, epochs=model.iter) but now I get a more complex error message: _Exception in thread Thread-13: Traceback (most recent call last): File "E:\Python3\lib\threading.py", line 916, in _bootstrap_inner self.run() File "E:\Python3\lib\threading.py", line 864, in run self._target(*self._args, **self. *kwargs) File "E:\Python3\lib\site-packages\gensim\models\word2vec.py", line 854, in job_producer for sent_idx, sentence in enumerate(sentences): File "E:\Python3\lib\site-packages\gensim\utils.py", line 687, in iter for document in self.corpus: TypeError: 'method' object is not iterable* — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <#16 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ACZ8yg8PDLNfZv6qSyCJdtoAr3EWdgJgks5sQoHngaJpZM4OeGri> .

RenatoPerotti · 2017-07-23T18:20:24Z

Nope, I get that it is not callable again (like the first issue we had with corpus_count):
_---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
in ()
----> 1 model.train(sentences.sentences_perm, total_examples=model.corpus_count, epochs=model.iter())

TypeError: 'int' object is not callable

_
I hope you got another idea :)

dalamar66 · 2017-09-21T14:39:50Z

This one worked for me:

model.train(sentences.sentences_perm(), total_examples=model.corpus_count, epochs=model.iter)

eriksonJAguiar · 2017-10-03T16:22:52Z

Hello Guy, I'm using the following command:

model.train(sentences, total_examples=model.corpus_count, epochs=model.iter) and it worked !

sentences.sentences_perm(), not work !!

Thanks for help

linanqiu · 2017-10-04T02:49:40Z

Sorry I have no time to update this right now. @eriksonJAguiar can you let me know the versions of python and gensim you're using? If it's the latest, I'll just make the change you mentioned. Thanks!

eriksonJAguiar · 2017-10-04T02:58:20Z

Hi @linanqiu, I'm using Python 3.5 and gensim 2.3.0 !!

Christings · 2018-03-06T03:03:28Z

Hello, I'm using the following command:
model.train(sentences, total_examples=model.corpus_count, epochs=model.iter) and it worked ! Thanks.

shashankboosi · 2018-03-23T05:39:15Z

Hello there,

I tried the command but I am getting the error

raise ValueError("You must specify an explict epochs count. The usual value is epochs=model.epochs.")
ValueError: You must specify an explict epochs count. The usual value is epochs=model.epochs.

when I tried :

model_dm.train(perm_sentences, total_examples=model_dm.corpus_count,epochs=model_dm.epochs)

but I mentioned the epochs as mentioned, but I still got the error.

Python version : 3.5
Gensim version : 3.4

Can you tell me what the problem is ?

Regards,
Shashank Reddy Boosi.

gsbnair · 2018-03-24T06:31:46Z

Hi Shashank,
As few comments said above, You need to make the following modification:
Change model.train(sentences.sentences_perm())
to
model.train(sentences.sentences_perm(), total_examples=model.corpus_count, epochs=model.iter)

It works! And got almost same results. 0.86464
I am using Python version : 3.6 And Gensim version : 3.4

NileshBharti2 · 2019-01-19T05:37:12Z

`

from future import absolute_import, division, print_function
import codecs
import glob
import logging
import multiprocessing
import os
import pprint
import re

import nltk
import gensim.models.word2vec as w2v
import sklearn.manifold
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns

import warnings
warnings.filterwarnings(action='ignore', category=UserWarning, module='gensim')

import gensim
%pylab inline
logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s', level=logging.INFO)
nltk.download("punkt")
nltk.download("stopwords")
book_filenames = sorted(glob.glob(r"C:\Users\Nilesh\Desktop\Machine.txt"))
print("Found books:")
book_filenames

corpus_raw = u""
for book_filename in book_filenames:
print("Reading '{0}'...".format(book_filename))
with codecs.open(book_filename, "r", "utf-8") as book_file:
corpus_raw += book_file.read()
print("Corpus is now {0} characters long".format(len(corpus_raw)))
print()
tokenizer = nltk.data.load('tokenizers/punkt/english.pickle')
raw_sentences = tokenizer.tokenize(corpus_raw)

#convert into a list of words
#rtemove unnnecessary,, split into words, no hyphens
#list of words
def sentence_to_wordlist(raw):
clean = re.sub("[^a-zA-Z]"," ", raw)
words = clean.split()
return words
#sentence where each word is tokenized
sentences = []
for raw_sentence in raw_sentences:
if len(raw_sentence) > 0:
sentences.append(sentence_to_wordlist(raw_sentence))

print(raw_sentences[5])
print(sentence_to_wordlist(raw_sentences[5]))
token_count = sum([len(sentence) for sentence in sentences])
print("The book corpus contains {0:,} tokens".format(token_count))

#ONCE we have vectors
#step 3 - build model
#3 main tasks that vectors help with
#DISTANCE, SIMILARITY, RANKING

Dimensionality of the resulting word vectors.

#more dimensions, more computationally expensive to train
#but also more accurate
#more dimensions = more generalized
num_features = 300

Minimum word count threshold.

min_word_count = 3

Number of threads to run in parallel.

#more workers, faster we train
num_workers = multiprocessing.cpu_count()

Context window length.

context_size = 7

Downsample setting for frequent words.

#0 - 1e-5 is good for this
downsampling = 1e-3

Seed for the RNG, to make the results reproducible.

#random number generator
#deterministic, good for debugging
seed = 1
word2vec = w2v.Word2Vec(
sg=1,
seed=seed,
workers=num_workers,
size=num_features,
min_count=min_word_count,
window=context_size,
sample=downsampling
)
word2vec.build_vocab(sentences)

print("Word2Vec vocabulary length:", len(word2vec.wv.vocab))
word2vec.train(sentences)

ValueError Traceback (most recent call last)
in ()
----> 1 word2vec.train(sentences)

~\Anaconda3\lib\site-packages\gensim\models\word2vec.py in train(self, sentences, total_examples, total_words, epochs, start_alpha, end_alpha, word_count, queue_factor, report_delay, compute_loss, callbacks)
609 sentences, total_examples=total_examples, total_words=total_words,
610 epochs=epochs, start_alpha=start_alpha, end_alpha=end_alpha, word_count=word_count,
--> 611 queue_factor=queue_factor, report_delay=report_delay, compute_loss=compute_loss, callbacks=callbacks)
612
613 def score(self, sentences, total_sentences=int(1e6), chunksize=100, queue_factor=2, report_delay=1):

~\Anaconda3\lib\site-packages\gensim\models\base_any2vec.py in train(self, sentences, total_examples, total_words, epochs, start_alpha, end_alpha, word_count, queue_factor, report_delay, compute_loss, callbacks)
567 sentences, total_examples=total_examples, total_words=total_words,
568 epochs=epochs, start_alpha=start_alpha, end_alpha=end_alpha, word_count=word_count,
--> 569 queue_factor=queue_factor, report_delay=report_delay, compute_loss=compute_loss, callbacks=callbacks)
570
571 def _get_job_params(self, cur_epoch):

~\Anaconda3\lib\site-packages\gensim\models\base_any2vec.py in train(self, data_iterable, epochs, total_examples, total_words, queue_factor, report_delay, callbacks, **kwargs)
239 epochs=epochs,
240 total_examples=total_examples,
--> 241 total_words=total_words, **kwargs)
242
243 for callback in self.callbacks:

~\Anaconda3\lib\site-packages\gensim\models\base_any2vec.py in _check_training_sanity(self, epochs, total_examples, total_words, **kwargs)
612 if total_words is None and total_examples is None:
613 raise ValueError(
--> 614 "You must specify either total_examples or total_words, for proper job parameters updation"
615 "and progress calculations. "
616 "The usual value is total_examples=model.corpus_count."

ValueError: You must specify either total_examples or total_words, for proper job parameters updationand progress calculations. The usual value is total_examples=model.corpus_count.

AnshikaAgrawal · 2019-11-16T08:45:28Z

Hello there,

I tried the command but I am getting the error

raise ValueError("You must specify an explict epochs count. The usual value is epochs=model.epochs.")
ValueError: You must specify an explict epochs count. The usual value is epochs=model.epochs.

when I tried :

model_dm.train(perm_sentences, total_examples=model_dm.corpus_count,epochs=model_dm.epochs)

but I mentioned the epochs as mentioned, but I still got the error.
Python version : 3.5
Gensim version : 3.4
Can you tell me what the problem is ?

Regards,
Shashank Reddy Boosi.

You can give explicit epochs value like 20, just as stated in the error. It worked for me!

gsbnair · 2019-11-16T08:54:41Z

I may be able to help you if you send me the code in full to check.

On Sat, 16 Nov 2019 at 2:15 PM, Anshika Agrawal ***@***.***> wrote: Hello there, I tried the command but I am getting the error raise ValueError("You must specify an explict epochs count. The usual value is epochs=model.epochs.") ValueError: You must specify an explict epochs count. The usual value is epochs=model.epochs. when I tried : model_dm.train(perm_sentences, total_examples=model_dm.corpus_count,epochs=model_dm.epochs) but I mentioned the epochs as mentioned, but I still got the error. Python version : 3.5 Gensim version : 3.4 Can you tell me what the problem is ? Regards, Shashank Reddy Boosi. You can give explicit epochs value like 20, just as stated in the error. It worked for me! — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#16?email_source=notifications&email_token=AAU3VQWUE537DIS3PANXQ5LQT6XKXA5CNFSM4DTYNLRKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOEEHMXBY#issuecomment-554617735>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAU3VQU6YNZHG3NGI5SX4KTQT6XKXANCNFSM4DTYNLRA> .

-- Thanks & Regards, Suresh Babu

shashankboosi · 2019-11-16T09:20:44Z

Hello there,
I tried the command but I am getting the error

raise ValueError("You must specify an explict epochs count. The usual value is epochs=model.epochs.")
ValueError: You must specify an explict epochs count. The usual value is epochs=model.epochs.

when I tried :

model_dm.train(perm_sentences, total_examples=model_dm.corpus_count,epochs=model_dm.epochs)

but I mentioned the epochs as mentioned, but I still got the error.
Python version : 3.5
Gensim version : 3.4
Can you tell me what the problem is ?
Regards,
Shashank Reddy Boosi.
You can give explicit epochs value like 20, just as stated in the error. It worked for me!

Yep. That is how it works I guess.
Thank you :)

linanqiu closed this as completed Jul 21, 2017

linanqiu reopened this Jul 21, 2017

ghost mentioned this issue Feb 22, 2019

some Issues when I train the model from scratch and test with the provided code minimalparts/nonce2vec#3

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

error with the training command #16

error with the training command #16

RenatoPerotti commented Jul 20, 2017

linanqiu commented Jul 21, 2017

linanqiu commented Jul 21, 2017

RenatoPerotti commented Jul 22, 2017 •

edited

Loading

linanqiu commented Jul 23, 2017 via email

RenatoPerotti commented Jul 23, 2017 •

edited

Loading

dalamar66 commented Sep 21, 2017

eriksonJAguiar commented Oct 3, 2017

linanqiu commented Oct 4, 2017

eriksonJAguiar commented Oct 4, 2017

Christings commented Mar 6, 2018

shashankboosi commented Mar 23, 2018

gsbnair commented Mar 24, 2018 •

edited

Loading

NileshBharti2 commented Jan 19, 2019 •

edited

Loading

AnshikaAgrawal commented Nov 16, 2019

gsbnair commented Nov 16, 2019 via email

shashankboosi commented Nov 16, 2019

error with the training command #16

error with the training command #16

Comments

RenatoPerotti commented Jul 20, 2017

linanqiu commented Jul 21, 2017

linanqiu commented Jul 21, 2017

RenatoPerotti commented Jul 22, 2017 • edited Loading

linanqiu commented Jul 23, 2017 via email

RenatoPerotti commented Jul 23, 2017 • edited Loading

dalamar66 commented Sep 21, 2017

eriksonJAguiar commented Oct 3, 2017

linanqiu commented Oct 4, 2017

eriksonJAguiar commented Oct 4, 2017

Christings commented Mar 6, 2018

shashankboosi commented Mar 23, 2018

gsbnair commented Mar 24, 2018 • edited Loading

NileshBharti2 commented Jan 19, 2019 • edited Loading

Dimensionality of the resulting word vectors.

Minimum word count threshold.

Number of threads to run in parallel.

Context window length.

Downsample setting for frequent words.

Seed for the RNG, to make the results reproducible.

AnshikaAgrawal commented Nov 16, 2019

gsbnair commented Nov 16, 2019 via email

shashankboosi commented Nov 16, 2019

RenatoPerotti commented Jul 22, 2017 •

edited

Loading

RenatoPerotti commented Jul 23, 2017 •

edited

Loading

gsbnair commented Mar 24, 2018 •

edited

Loading

NileshBharti2 commented Jan 19, 2019 •

edited

Loading