You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
hi, team, in fully_fused_mlp.cu , the following looks not understandable:
// If the output width is larger than 16 dims, we use cutlass to backpropagate through the last layer// rather than fusing it with our kernel.if (m_output_width > 16) {
fc_multiply<FullLayer>(stream, output_weight_matrix(use_inference_params).transposed(), tmp_dL_doutput, forward.hidden.at(tmp_idx), backward_tmp.at(backward_tmp_idx), m_activation, true);
}
I suppose it's computing: forward.hidden.at(output_layer) = output_matrix.T * tmp_dl_doutput
for a 2-hidden layer mlp network, it's something like:
hi, team, in
fully_fused_mlp.cu
, the following looks not understandable:I suppose it's computing: forward.hidden.at(output_layer) = output_matrix.T * tmp_dl_doutput
for a 2-hidden layer mlp network, it's something like:
so for
fc_multiply
, the epilogue suppose to be the derivate of Relu(activation function), rather thanRelu
itself ?Thanks for guidance
ZJ
The text was updated successfully, but these errors were encountered: