-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use graph optimizer for gpu tensor prepack #19814
Conversation
try to use graph transformer as prepack
01a6521
to
4b6097b
Compare
bool column_wise_blocking, | ||
bool small_m, | ||
bool has_offsets> | ||
Status blkq4_gemm_sm80(int m, int n, int k, cudaStream_t stream, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume this can be supported by sm86, sm89?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it should
} | ||
break; | ||
|
||
case 64: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you support case=128? which is used widely.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's difficult for this kernel. I am working on another version which hopefully can support that.
d7bce4d
to
65573be
Compare
63d5836
to
44a95f8
Compare
44a95f8
to
38d1851
Compare
Restarting in another PR |
Description
Use graph optimizer for cuda operator prepacking.
Motivation and Context
Our first sm80 quantized gemm kernel requires prepacking. Using current prepack infrastructure in gpu operator causes memory bloat -- old memory buffer not released.
So we define prepacking logic in a new graph optimizer. This solves the memory bloat problem. However, it also introduces the following problems: