You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As discussed in A.3, training a and b does not seem to influence the performance. An intuition is as what you mentioned: "as the b_j values do not deviate significantly from their initial values" (How about a_j?). Do you have any theoretic evidence of why this is true? To my knowledge, in language processing tasks, they let the network learn also the embedding of each token, and it makes the performance better.
Is there a notebook to experiment with this (Figure 8)?
The text was updated successfully, but these errors were encountered:
As discussed in A.3, training a and b does not seem to influence the performance. An intuition is as what you mentioned: "as the b_j values do not deviate significantly from their initial values" (How about a_j?). Do you have any theoretic evidence of why this is true? To my knowledge, in language processing tasks, they let the network learn also the embedding of each token, and it makes the performance better.
Is there a notebook to experiment with this (Figure 8)?
The text was updated successfully, but these errors were encountered: