Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error with the training command #16

Open
RenatoPerotti opened this issue Jul 20, 2017 · 16 comments
Open

error with the training command #16

RenatoPerotti opened this issue Jul 20, 2017 · 16 comments

Comments

@RenatoPerotti
Copy link

The new word2vec requires total_examples to be specified in the train command, now it gives the error:

ValueError: You must specify either total_examples or total_words, for proper alpha and progress calculations. The usual value is total_examples=model.corpus_count.

so I changed it to the following:

model.train(sentences.sentences_perm,total_examples=model.corpus_count())

but it gives a new error:

TypeError: 'int' object is not callable

Does anyone have an idea what to do with this?

@linanqiu
Copy link
Owner

try corpus_count instead of corpus_count()

I'll update this during the weekend

@linanqiu linanqiu reopened this Jul 21, 2017
@linanqiu
Copy link
Owner

Let me know if it works!

@RenatoPerotti
Copy link
Author

RenatoPerotti commented Jul 22, 2017

First I changed the line into:
model.train(sentences.sentences_perm, total_examples=model.corpus_count)
after that it proposes to ad epochs so I changed it into:
model.train(sentences.sentences_perm, total_examples=model.corpus_count, epochs=model.iter)
but now I get a more complex error message:

_Exception in thread Thread-13:
Traceback (most recent call last):
File "E:\Python3\lib\threading.py", line 916, in _bootstrap_inner
self.run()
File "E:\Python3\lib\threading.py", line 864, in run
self._target(*self._args, **self.kwargs)
File "E:\Python3\lib\site-packages\gensim\models\word2vec.py", line 854, in job_producer
for sent_idx, sentence in enumerate(sentences):
File "E:\Python3\lib\site-packages\gensim\utils.py", line 687, in iter
for document in self.corpus:
TypeError: 'method' object is not iterable

@linanqiu
Copy link
Owner

linanqiu commented Jul 23, 2017 via email

@RenatoPerotti
Copy link
Author

RenatoPerotti commented Jul 23, 2017

Nope, I get that it is not callable again (like the first issue we had with corpus_count):
_---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
in ()
----> 1 model.train(sentences.sentences_perm, total_examples=model.corpus_count, epochs=model.iter())

TypeError: 'int' object is not callable

_
I hope you got another idea :)

@dalamar66
Copy link

This one worked for me:

model.train(sentences.sentences_perm(), total_examples=model.corpus_count, epochs=model.iter)

@eriksonJAguiar
Copy link

Hello Guy, I'm using the following command:

model.train(sentences, total_examples=model.corpus_count, epochs=model.iter) and it worked !

sentences.sentences_perm(), not work !!

Thanks for help

@linanqiu
Copy link
Owner

linanqiu commented Oct 4, 2017

Sorry I have no time to update this right now. @eriksonJAguiar can you let me know the versions of python and gensim you're using? If it's the latest, I'll just make the change you mentioned. Thanks!

@eriksonJAguiar
Copy link

Hi @linanqiu, I'm using Python 3.5 and gensim 2.3.0 !!

@Christings
Copy link

Hello, I'm using the following command:
model.train(sentences, total_examples=model.corpus_count, epochs=model.iter) and it worked ! Thanks.

@shashankboosi
Copy link

Hello there,

I tried the command but I am getting the error

raise ValueError("You must specify an explict epochs count. The usual value is epochs=model.epochs.")
ValueError: You must specify an explict epochs count. The usual value is epochs=model.epochs.

when I tried :

model_dm.train(perm_sentences, total_examples=model_dm.corpus_count,epochs=model_dm.epochs)

but I mentioned the epochs as mentioned, but I still got the error.

Python version : 3.5
Gensim version : 3.4

Can you tell me what the problem is ?

Regards,
Shashank Reddy Boosi.

@gsbnair
Copy link

gsbnair commented Mar 24, 2018

Hi Shashank,
As few comments said above, You need to make the following modification:
Change model.train(sentences.sentences_perm())
to
model.train(sentences.sentences_perm(), total_examples=model.corpus_count, epochs=model.iter)

It works! And got almost same results. 0.86464
I am using Python version : 3.6 And Gensim version : 3.4

@NileshBharti2
Copy link

NileshBharti2 commented Jan 19, 2019

`

from future import absolute_import, division, print_function
import codecs
import glob
import logging
import multiprocessing
import os
import pprint
import re

import nltk
import gensim.models.word2vec as w2v
import sklearn.manifold
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import seaborn as sns

import warnings
warnings.filterwarnings(action='ignore', category=UserWarning, module='gensim')

import gensim
%pylab inline
logging.basicConfig(format='%(asctime)s : %(levelname)s : %(message)s', level=logging.INFO)
nltk.download("punkt")
nltk.download("stopwords")
book_filenames = sorted(glob.glob(r"C:\Users\Nilesh\Desktop\Machine.txt"))
print("Found books:")
book_filenames

corpus_raw = u""
for book_filename in book_filenames:
print("Reading '{0}'...".format(book_filename))
with codecs.open(book_filename, "r", "utf-8") as book_file:
corpus_raw += book_file.read()
print("Corpus is now {0} characters long".format(len(corpus_raw)))
print()
tokenizer = nltk.data.load('tokenizers/punkt/english.pickle')
raw_sentences = tokenizer.tokenize(corpus_raw)

#convert into a list of words
#rtemove unnnecessary,, split into words, no hyphens
#list of words
def sentence_to_wordlist(raw):
clean = re.sub("[^a-zA-Z]"," ", raw)
words = clean.split()
return words
#sentence where each word is tokenized
sentences = []
for raw_sentence in raw_sentences:
if len(raw_sentence) > 0:
sentences.append(sentence_to_wordlist(raw_sentence))

print(raw_sentences[5])
print(sentence_to_wordlist(raw_sentences[5]))
token_count = sum([len(sentence) for sentence in sentences])
print("The book corpus contains {0:,} tokens".format(token_count))

#ONCE we have vectors
#step 3 - build model
#3 main tasks that vectors help with
#DISTANCE, SIMILARITY, RANKING

Dimensionality of the resulting word vectors.

#more dimensions, more computationally expensive to train
#but also more accurate
#more dimensions = more generalized
num_features = 300

Minimum word count threshold.

min_word_count = 3

Number of threads to run in parallel.

#more workers, faster we train
num_workers = multiprocessing.cpu_count()

Context window length.

context_size = 7

Downsample setting for frequent words.

#0 - 1e-5 is good for this
downsampling = 1e-3

Seed for the RNG, to make the results reproducible.

#random number generator
#deterministic, good for debugging
seed = 1
word2vec = w2v.Word2Vec(
sg=1,
seed=seed,
workers=num_workers,
size=num_features,
min_count=min_word_count,
window=context_size,
sample=downsampling
)
word2vec.build_vocab(sentences)

print("Word2Vec vocabulary length:", len(word2vec.wv.vocab))
word2vec.train(sentences)

ValueError Traceback (most recent call last)
in ()
----> 1 word2vec.train(sentences)

~\Anaconda3\lib\site-packages\gensim\models\word2vec.py in train(self, sentences, total_examples, total_words, epochs, start_alpha, end_alpha, word_count, queue_factor, report_delay, compute_loss, callbacks)
609 sentences, total_examples=total_examples, total_words=total_words,
610 epochs=epochs, start_alpha=start_alpha, end_alpha=end_alpha, word_count=word_count,
--> 611 queue_factor=queue_factor, report_delay=report_delay, compute_loss=compute_loss, callbacks=callbacks)
612
613 def score(self, sentences, total_sentences=int(1e6), chunksize=100, queue_factor=2, report_delay=1):

~\Anaconda3\lib\site-packages\gensim\models\base_any2vec.py in train(self, sentences, total_examples, total_words, epochs, start_alpha, end_alpha, word_count, queue_factor, report_delay, compute_loss, callbacks)
567 sentences, total_examples=total_examples, total_words=total_words,
568 epochs=epochs, start_alpha=start_alpha, end_alpha=end_alpha, word_count=word_count,
--> 569 queue_factor=queue_factor, report_delay=report_delay, compute_loss=compute_loss, callbacks=callbacks)
570
571 def _get_job_params(self, cur_epoch):

~\Anaconda3\lib\site-packages\gensim\models\base_any2vec.py in train(self, data_iterable, epochs, total_examples, total_words, queue_factor, report_delay, callbacks, **kwargs)
239 epochs=epochs,
240 total_examples=total_examples,
--> 241 total_words=total_words, **kwargs)
242
243 for callback in self.callbacks:

~\Anaconda3\lib\site-packages\gensim\models\base_any2vec.py in _check_training_sanity(self, epochs, total_examples, total_words, **kwargs)
612 if total_words is None and total_examples is None:
613 raise ValueError(
--> 614 "You must specify either total_examples or total_words, for proper job parameters updation"
615 "and progress calculations. "
616 "The usual value is total_examples=model.corpus_count."

ValueError: You must specify either total_examples or total_words, for proper job parameters updationand progress calculations. The usual value is total_examples=model.corpus_count.

@AnshikaAgrawal
Copy link

Hello there,

I tried the command but I am getting the error

raise ValueError("You must specify an explict epochs count. The usual value is epochs=model.epochs.")
ValueError: You must specify an explict epochs count. The usual value is epochs=model.epochs.

when I tried :

model_dm.train(perm_sentences, total_examples=model_dm.corpus_count,epochs=model_dm.epochs)

but I mentioned the epochs as mentioned, but I still got the error.

Python version : 3.5
Gensim version : 3.4

Can you tell me what the problem is ?

Regards,
Shashank Reddy Boosi.

You can give explicit epochs value like 20, just as stated in the error. It worked for me!

@gsbnair
Copy link

gsbnair commented Nov 16, 2019 via email

@shashankboosi
Copy link

Hello there,
I tried the command but I am getting the error

raise ValueError("You must specify an explict epochs count. The usual value is epochs=model.epochs.")
ValueError: You must specify an explict epochs count. The usual value is epochs=model.epochs.

when I tried :

model_dm.train(perm_sentences, total_examples=model_dm.corpus_count,epochs=model_dm.epochs)

but I mentioned the epochs as mentioned, but I still got the error.

Python version : 3.5
Gensim version : 3.4

Can you tell me what the problem is ?
Regards,
Shashank Reddy Boosi.

You can give explicit epochs value like 20, just as stated in the error. It worked for me!

Yep. That is how it works I guess.
Thank you :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

9 participants