You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
CUBLASLT_EPILOGUE_BIAS
step 1 : matrix multiplication A( M, K ) X B (K, N) = C (M, N) :
step 2 : Obtain Bias Vector (1, M) eg: [ 0.1, 0.2 0.3]
step 3 : Broadcast Bias Vector :
we can simply replicate it N times along the columns,
resulting in a new bias vector with dimensions (M x N)
eg:
A(4, 2), B(2, 3), C(4, 3)
Broadcast Bias:
|0.1 0.2 0.3|
|0.1 0.2 0.3|
|0.1 0.2 0.3|
|0.1 0.2 0.3|
C = C + Broadcast Bias
CUBLASLT_EPILOGUE_BGRADB
Apply Bias gradient to the input matrix B. The bias size corresponds to the number of columns of the matrix D.
The reduction happens over the GEMM’s “k” dimension. Store Bias gradient in the bias buffer
CUBLASLT_EPILOGUE_GELU_AUX_BIAS
* fusion of gelu and bias
CUBLASLT_EPILOGUE_DGELU_BGRAD
* fusion of gelu and bgrad
The text was updated successfully, but these errors were encountered:
Synced with @bghimireamd, CK already has CUBLASLT_EPILOGUE_GELU_AUX, CUBLASLT_EPILOGUE_BIAS, CUBLASLT_EPILOGUE_GELU_AUX_BIAS. We can quickly add CUBLASLT_EPILOGUE_DGELU. For CUBLASLT_EPILOGUE_BGRADB, we need double-check.
The ones we need for Transformer Engine are the following:
CUBLASLT_EPILOGUE_GELU_AUX
step 1 : matrix multiplication
step 2 : apply gelu
step 3 : store the result to seperate matrix (basically do matrix copy)
CUBLASLT_EPILOGUE_DGELU :
step 1 : matrix multiplication
step 2 : apply derivative of gelu
CUBLASLT_EPILOGUE_BIAS
step 1 : matrix multiplication A( M, K ) X B (K, N) = C (M, N) :
step 2 : Obtain Bias Vector (1, M) eg: [ 0.1, 0.2 0.3]
step 3 : Broadcast Bias Vector :
we can simply replicate it N times along the columns,
resulting in a new bias vector with dimensions (M x N)
eg:
A(4, 2), B(2, 3), C(4, 3)
Broadcast Bias:
|0.1 0.2 0.3|
|0.1 0.2 0.3|
|0.1 0.2 0.3|
|0.1 0.2 0.3|
C = C + Broadcast Bias
CUBLASLT_EPILOGUE_BGRADB
Apply Bias gradient to the input matrix B. The bias size corresponds to the number of columns of the matrix D.
The reduction happens over the GEMM’s “k” dimension. Store Bias gradient in the bias buffer
CUBLASLT_EPILOGUE_GELU_AUX_BIAS
* fusion of gelu and bias
CUBLASLT_EPILOGUE_DGELU_BGRAD
* fusion of gelu and bgrad
The text was updated successfully, but these errors were encountered: