You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The training config parameter ffn_hidden, see e.g. here, currently needs to be specified (as a multiple of 128) although it is only actually used in the case of GeLU. In contrast, if SwiGLU is used, the effective hidden dimension is automatically derived from n_embd instead, see here.
This is misleading and should be fixed.
The text was updated successfully, but these errors were encountered:
The training config parameter
ffn_hidden
, see e.g. here, currently needs to be specified (as a multiple of 128) although it is only actually used in the case of GeLU. In contrast, if SwiGLU is used, the effective hidden dimension is automatically derived fromn_embd
instead, see here.This is misleading and should be fixed.
The text was updated successfully, but these errors were encountered: