-
Notifications
You must be signed in to change notification settings - Fork 374
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
LDAvis integration #136
Comments
@ddangelov any suggestions about this? |
For the cosine similarities you could pass all of the values through a softmax this will resolve the problem with negative values. For doc_topic_dists the |
@paulthemagno Did you get anywhere with this? Correct me if I'm wrong, @ddangelov, but as far as I understand, we would need issue #141 to be resolved before we could use the |
Yes you would benefit from issue #141 being resolved. |
A useful feature would be an integration with LDAvis to see the clusters.
For example I'm trying to use pyLDAvis putting in the prepare function the values. I would like to understand which values to give to this function from Top2Vec. It needs:
Matrix of topic-term probabilities. Where n_terms is len(vocab).
Matrix of document-topic probabilities.
The length of each document, i.e. the number of words in each document. The order of the numbers should be consistent with the ordering of the docs in doc_topic_dists.
List of all the words in the corpus used to train the model.
The count of each particular term over the entire corpus. The ordering of these counts should correspond with vocab and topic_term_dists.
For the topic_term_dists I thought to do something like this:
So passing each term of the vocab to the
search_topics
function as a single keyword to have the cosine similarity (I read it in the description of the method) for each couple term-topic. The problem is some values are negative (isn't the cosine similarity a range between 0 and 1?) where I expected all positive values with a total sum of 1.For the second parameter doc_topic_dists can I use
search_documents_by_vector
or_calculate_documents_topic
function passing each vector of the topics?The text was updated successfully, but these errors were encountered: