Parameters a_j and b_j #3

kwea123 · 2020-06-26T02:48:43Z

As discussed in A.3, training a and b does not seem to influence the performance. An intuition is as what you mentioned: "as the b_j values do not deviate significantly from their initial values" (How about a_j?). Do you have any theoretic evidence of why this is true? To my knowledge, in language processing tasks, they let the network learn also the embedding of each token, and it makes the performance better.

Is there a notebook to experiment with this (Figure 8)?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parameters a_j and b_j #3

Parameters a_j and b_j #3

kwea123 commented Jun 26, 2020 •

edited

Loading

Parameters a_j and b_j #3

Parameters a_j and b_j #3

Comments

kwea123 commented Jun 26, 2020 • edited Loading

kwea123 commented Jun 26, 2020 •

edited

Loading