Embed Rows and Columns #1070

jbax3 · 2023-10-23T22:06:22Z

jbax3
Oct 23, 2023

Let's assume row's are a person's answers, columns are questions, with values being ordinal Likert data. This roughly represents a bipartite adjacency matrix, where people go to questions that go to people.

For metrics, I have considered jaccard, correlation, and Manhattan. Manhattan seems to perform best (more distinct groupings). I have tried n_neighbors = {5, 15, 100}. Generally seems invariant, sticking to default.

Let's say that each of the questions comes from a general category (i.e. 5 questions are about "liking technology"). How could I potentially recover embeddings for the questions w/in the same space, and then average per each category/theme?

I have tried:

Creating fake rows that in theme columns are maximized and in non-theme columns are drawn from the same distribution. I eventually average these. Some of the results are reasonable, some are ehh. Probably my best overall results.
Attempting to do X.T@E (X~orig matrix, E~2 dim embedding matrix) and different types of averaging according to the weights. Probably a pretty suspect idea. Didn't really seem to preserve anything.
Doing the full adjacency matrix, which would be [[dense_a, 0s], [0s, dense_b]]. It is clear to see that type_an and type_b are in different column spaces. Results showed an entirely separate cluster of questions. A friend has suggested A@A to get the length 2 path to remove this issue.

Wanted to see if you had any other thoughts or comments on the main question (or other parts of the approach).

Thanks for all of the great dev you do!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Embed Rows and Columns #1070

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 0 comments

Select a reply

Embed Rows **and** Columns #1070

jbax3 Oct 23, 2023

Replies: 0 comments

Embed Rows and Columns #1070

jbax3
Oct 23, 2023