Skip to content

How to tune cutlass matmul kernels to approach cublasLt one? #700

Answered by hwu36
MARD1NO asked this question in Q&A
Discussion options

You must be logged in to vote

You are very good!

cublas has 2 in its grid.z which means it uses splitK. To do apple to apple comparison, you need to do the same.

Note, different version of cutlass and compiler will cause some performance difference.

Replies: 1 comment 6 replies

Comment options

You must be logged in to vote
6 replies
@MARD1NO
Comment options

@hwu36
Comment options

@alohali
Comment options

@hwu36
Comment options

@alohali
Comment options

Answer selected by MARD1NO
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
3 participants