You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This is a great project which helps a lot.
I am using DTM on a set of abstracts of english scientific papers (about 60000, spanning from 2000 to 2024) on the same topic: Electrochemical Energy Storage. I am trying to decide the optimal topic number K based on common indicators like coherence and perplexity.
However, seems that all the coherence measurements (which are provided by tp.coherence.Coherence().get_score()) are negative, including c_npmi, c_uci, u_mass. Besides, c_v seems to be working, but other users mentioned that there are also problems within c_v.
By the way, the results I got with pyLDAvis were also not good, with a large overlap between topics. I have tried many changes, including different k from 2 to 100, different parameters setting such as timepoint , rm_top and min_df, but the result did not improve.
Does this mean that there is a problem with my corpus?
P.S. there is an error with DTM training when k=1, got_Process finished with exit code -1073741819 (0xC0000005)_
The text was updated successfully, but these errors were encountered:
Well, i have tested LDAmodel, and all the coherence measurements work well. What's more, even the result of pyLDAvis turns into clear and meaningful. Does this mean that my corpus are not suitable for DTM, or it still has some problems?
This is a great project which helps a lot.
I am using DTM on a set of abstracts of english scientific papers (about 60000, spanning from 2000 to 2024) on the same topic: Electrochemical Energy Storage. I am trying to decide the optimal topic number K based on common indicators like coherence and perplexity.
However, seems that all the coherence measurements (which are provided by
tp.coherence.Coherence().get_score()
) are negative, including c_npmi, c_uci, u_mass. Besides, c_v seems to be working, but other users mentioned that there are also problems within c_v.By the way, the results I got with pyLDAvis were also not good, with a large overlap between topics. I have tried many changes, including different k from 2 to 100, different parameters setting such as timepoint , rm_top and min_df, but the result did not improve.
Does this mean that there is a problem with my corpus?
P.S. there is an error with DTM training when k=1, got_Process finished with exit code -1073741819 (0xC0000005)_
The text was updated successfully, but these errors were encountered: