You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Let's assume row's are a person's answers, columns are questions, with values being ordinal Likert data. This roughly represents a bipartite adjacency matrix, where people go to questions that go to people.
For metrics, I have considered jaccard, correlation, and Manhattan. Manhattan seems to perform best (more distinct groupings). I have tried n_neighbors = {5, 15, 100}. Generally seems invariant, sticking to default.
Let's say that each of the questions comes from a general category (i.e. 5 questions are about "liking technology"). How could I potentially recover embeddings for the questions w/in the same space, and then average per each category/theme?
I have tried:
Creating fake rows that in theme columns are maximized and in non-theme columns are drawn from the same distribution. I eventually average these. Some of the results are reasonable, some are ehh. Probably my best overall results.
Attempting to do X.T@E (X~orig matrix, E~2 dim embedding matrix) and different types of averaging according to the weights. Probably a pretty suspect idea. Didn't really seem to preserve anything.
Doing the full adjacency matrix, which would be [[dense_a, 0s], [0s, dense_b]]. It is clear to see that type_an and type_b are in different column spaces. Results showed an entirely separate cluster of questions. A friend has suggested A@A to get the length 2 path to remove this issue.
Wanted to see if you had any other thoughts or comments on the main question (or other parts of the approach).
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Let's assume row's are a person's answers, columns are questions, with values being ordinal Likert data. This roughly represents a bipartite adjacency matrix, where people go to questions that go to people.
For metrics, I have considered jaccard, correlation, and Manhattan. Manhattan seems to perform best (more distinct groupings). I have tried n_neighbors = {5, 15, 100}. Generally seems invariant, sticking to default.
Let's say that each of the questions comes from a general category (i.e. 5 questions are about "liking technology"). How could I potentially recover embeddings for the questions w/in the same space, and then average per each category/theme?
I have tried:
Wanted to see if you had any other thoughts or comments on the main question (or other parts of the approach).
Thanks for all of the great dev you do!
Beta Was this translation helpful? Give feedback.
All reactions