-
Notifications
You must be signed in to change notification settings - Fork 363
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TypeError: unhashable type: 'Int64Index' #202
Comments
I had the same error solved by reducing the number of topics. |
What version are you using? I suggest using vs 3.3.1 and upgrading all |
FWIW, I ran into a similar issue with Python 3.8 and vs 3.3.1 in a situation where the original K of my model is greater than the resulting number of clusters. I've been driving myself insane trying to find a work around as my underlying data is pretty noisy, so if I reduce K to not have empty clusters a lot of junk ends up being spread around instead of being dumped into just a few clusters. Wish I could just go without the viz, but our communications team finds it really useful as a top line scan of recent twitter chatter. I tried implementing the following, borrowing from here, but that just got me to a different error (
Happy to post/share my full code if it's helpful. Thanks! |
I'm having the same error. Reducing the number of topics < 10 solves the issue though, but this def not optimal. |
I am also running into this issue. Following are the steps to reproduce it. Happy to provide more details if necessary. I am using pyLDAvis 3.3.1 I used the following 5 lines as documents to train a topic model for 5 topics.
data = [['I', 'ate', 'dinner'], ['We', 'had', 'a', 'three', 'course', 'meal'], ['In', 'the', 'end', 'we', 'all', 'felt', 'like', 'we', 'ate', 'too', 'much'], ['We', 'all', 'agreed', 'it', 'was', 'a', 'magnificent', 'evening'], ['He', 'loves', 'fish', 'tacos']]
id2word = corpora.Dictionary(data) # gensim.corpora
texts = data
corpus = [id2word.doc2bow(text1) for text1 in texts]
lda_mallet_model = gensim.models.wrappers.LdaMallet(mallet_path, corpus=corpus, num_topics=5, id2word=id2word, random_seed = 41) # I am using gensim-version 3.8.3
gensim_model = gensim.models.wrappers.ldamallet.malletmodel2ldamodel(lda_mallet_model)
pyldaVis_prepared_model = pyLDAvis.gensim_models.prepare(gensim_model, corpus, id2word) # this lines gives the error The error is:
|
I can reproduce the error now...work in progress TBC
for a working example (pyLDAvis_overview.ipynb), we get
your model produces
|
Hi Mark, I was wondering - did you manage to find some time to look into the above? Many thanks & best regards, Mike |
Hi, I believe the problem is that On line 258-259 in _prepare.py, log_lift = np.log(pd.eval("topic_term_dists / term_proportion")).astype("float64")
log_ttd = np.log(pd.eval("topic_term_dists")).astype("float64") when pyldavis calculate Then, on line 217-219 in _prepare.py, def _find_relevance(log_ttd, log_lift, R, lambda_):
relevance = lambda_ * log_ttd + (1 - lambda_) * log_lift
return relevance.T.apply(lambda topic: topic.nlargest(R).index) when it calculates relevance for different
Finally , when we call
Given that this problem is not specific to LDA Mallet (since any model which has 0 in I suggest to modify line 258-259 in _prepare.py to # to avoid -inf when calculating log_lift and log_ttd
topic_term_dists_non_zero = topic_term_dists.replace(0,1e-10)
log_lift = np.log(pd.eval("topic_term_dists_non_zero / term_proportion")).astype("float64")
log_ttd = np.log(pd.eval("topic_term_dists_non_zero")).astype("float64") |
Hi Ben,
Thank you for your great work!
I generated topic models with 5 different topic number on the same corpus and dictionary. I can use pyLDAvis to visualize
four of them, but one got an error. Would you please like to help me with this error. I got this error on both new and old version of pyLDAvis.
Best,
Zhijun
ERROR information
TypeError Traceback (most recent call last)
in
----> 1 vismallet = gensimvis.prepare(models[3], corpus, dictionary=id2word, sort_topics=False)
~\AppData\Roaming\Python\Python38\site-packages\pyLDAvis\gensim_models.py in prepare(topic_model, corpus, dictionary, doc_topic_dist, **kwargs)
121 """
122 opts = fp.merge(_extract_data(topic_model, corpus, dictionary, doc_topic_dist), kwargs)
--> 123 return pyLDAvis.prepare(**opts)
~\AppData\Roaming\Python\Python38\site-packages\pyLDAvis_prepare.py in prepare(topic_term_dists, doc_topic_dists, doc_lengths, vocab, term_frequency, R, lambda_step, mds, n_jobs, plot_opts, sort_topics, start_index)
437 term_frequency = np.sum(term_topic_freq, axis=0)
438
--> 439 topic_info = _topic_info(topic_term_dists, topic_proportion,
440 term_frequency, term_topic_freq, vocab, lambda_step, R,
441 n_jobs, start_index)
~\AppData\Roaming\Python\Python38\site-packages\pyLDAvis_prepare.py in _topic_info(topic_term_dists, topic_proportion, term_frequency, term_topic_freq, vocab, lambda_step, R, n_jobs, start_index)
278 for ls in _job_chunks(lambda_seq, n_jobs)))
279 topic_dfs = map(topic_top_term_df, enumerate(top_terms.T.iterrows(), start_index))
--> 280 return pd.concat([default_term_info] + list(topic_dfs))
281
282
~\AppData\Roaming\Python\Python38\site-packages\pyLDAvis_prepare.py in topic_top_term_df(tup)
262 def topic_top_term_df(tup):
263 new_topic_id, (original_topic_id, topic_terms) = tup
--> 264 term_ix = topic_terms.unique()
265 df = pd.DataFrame({'Term': vocab[term_ix],
266 'Freq': term_topic_freq.loc[original_topic_id, term_ix],
~\AppData\Roaming\Python\Python38\site-packages\pandas\core\series.py in unique(self)
1870 Categories (3, object): ['a' < 'b' < 'c']
1871 """
-> 1872 result = super().unique()
1873 return result
1874
~\AppData\Roaming\Python\Python38\site-packages\pandas\core\base.py in unique(self)
1045 result = np.asarray(result)
1046 else:
-> 1047 result = unique1d(values)
1048
1049 return result
~\AppData\Roaming\Python\Python38\site-packages\pandas\core\algorithms.py in unique(values)
405
406 table = htable(len(values))
--> 407 uniques = table.unique(values)
408 uniques = _reconstruct_data(uniques, original.dtype, original)
409 return uniques
pandas_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable.unique()
pandas_libs\hashtable_class_helper.pxi in pandas._libs.hashtable.PyObjectHashTable._unique()
~\AppData\Roaming\Python\Python38\site-packages\pandas\core\indexes\base.py in hash(self)
4271 @Final
4272 def hash(self):
-> 4273 raise TypeError(f"unhashable type: {repr(type(self).name)}")
4274
4275 @Final
TypeError: unhashable type: 'Int64Index'
The text was updated successfully, but these errors were encountered: