You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi
According to the paper:
HDBSCAN assigns a label to each dense cluster of document vectors and assigns a noise
label to all document vectors that are not in a dense cluster.
If a document was assigned to a noise label, will it be in Topic -1 or Topic 0? I cannot find it in the documentation.
I don't get Topic -1 in my experiments.
Thanks
The text was updated successfully, but these errors were encountered:
I had this question too. I think that topic 0 is noise but I'm not entirely sure. Maybe @ddangelov could weight in. I've found that if you look closely there are lots of other clusters that could be categorized as "noise" as well based on the top words. In my pipeline I look at proportion of topics that are missing the top 5 words from the topic_words, and if they have less than 2 of the top 5 words and confidence below 0.4 I call it an outlier. Then I look at the proportion of outliers for each cluster, and if it's mostly outliers I call it a noise cluster. That works for my data. It might not work for yours.
Hi
According to the paper:
HDBSCAN assigns a label to each dense cluster of document vectors and assigns a noise
label to all document vectors that are not in a dense cluster.
If a document was assigned to a noise label, will it be in Topic -1 or Topic 0? I cannot find it in the documentation.
I don't get Topic -1 in my experiments.
Thanks
The text was updated successfully, but these errors were encountered: