got negative Coherence for DTM, including NPMI, UCI and U_mass #218

Garren87 · 2024-06-16T15:51:03Z

This is a great project which helps a lot.
I am using DTM on a set of abstracts of english scientific papers (about 60000, spanning from 2000 to 2024) on the same topic: Electrochemical Energy Storage. I am trying to decide the optimal topic number K based on common indicators like coherence and perplexity.
However, seems that all the coherence measurements (which are provided by tp.coherence.Coherence().get_score()) are negative, including c_npmi, c_uci, u_mass. Besides, c_v seems to be working, but other users mentioned that there are also problems within c_v.
By the way, the results I got with pyLDAvis were also not good, with a large overlap between topics. I have tried many changes, including different k from 2 to 100, different parameters setting such as timepoint , rm_top and min_df, but the result did not improve.
Does this mean that there is a problem with my corpus?
P.S. there is an error with DTM training when k=1, got_Process finished with exit code -1073741819 (0xC0000005)_

The text was updated successfully, but these errors were encountered:

Garren87 · 2024-06-17T14:15:51Z

Well, i have tested LDAmodel, and all the coherence measurements work well. What's more, even the result of pyLDAvis turns into clear and meaningful. Does this mean that my corpus are not suitable for DTM, or it still has some problems?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

got negative Coherence for DTM, including NPMI, UCI and U_mass #218

got negative Coherence for DTM, including NPMI, UCI and U_mass #218

Garren87 commented Jun 16, 2024

Garren87 commented Jun 17, 2024

got negative Coherence for DTM, including NPMI, UCI and U_mass #218

got negative Coherence for DTM, including NPMI, UCI and U_mass #218

Comments

Garren87 commented Jun 16, 2024

Garren87 commented Jun 17, 2024