[Question] Cosine Similarity #69

dhkim0225 · 2024-02-28T02:29:45Z

The total number of data points of cosine similarity figure presented in the paper was confirmed to be 36,
and based on this, I'm making several attempts to guess which model among small, base, and large was used.

When I performed visualization using the feature just before the residual connection, I found a significant difference from the feature trend presented in the paper.

My questions are as follows:

What model was used for visualization? (Which one: small, base, large?)
What specifically was the output feature of the feature mentioned in the paper and what layer was it? (The final output feature of the block? The feature just before further operations?)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] Cosine Similarity #69

[Question] Cosine Similarity #69

dhkim0225 commented Feb 28, 2024

[Question] Cosine Similarity #69

[Question] Cosine Similarity #69

Comments

dhkim0225 commented Feb 28, 2024