You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The total number of data points of cosine similarity figure presented in the paper was confirmed to be 36,
and based on this, I'm making several attempts to guess which model among small, base, and large was used.
When I performed visualization using the feature just before the residual connection, I found a significant difference from the feature trend presented in the paper.
My questions are as follows:
What model was used for visualization? (Which one: small, base, large?)
What specifically was the output feature of the feature mentioned in the paper and what layer was it? (The final output feature of the block? The feature just before further operations?)
The text was updated successfully, but these errors were encountered:
The total number of data points of cosine similarity figure presented in the paper was confirmed to be 36,
and based on this, I'm making several attempts to guess which model among small, base, and large was used.
When I performed visualization using the feature just before the residual connection, I found a significant difference from the feature trend presented in the paper.
My questions are as follows:
The text was updated successfully, but these errors were encountered: