-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[CUDA] Special case for K==0 in CUDA MatMul #21525
Conversation
/azp run Windows GPU CI Pipeline |
No pipelines are associated with this pull request. |
@yuslepukhin @tianleiwu
|
We do not have AIX build, however, my hunch is that |
Hi @yuslepukhin |
It depends on the floating-point format used on AIX. I have not worked with AIX for nearly 20 years. |
### Description Replace `memset(0)` with `std::fill(T{})`. This would ensure that all the types are initialized in a portable way. ### Motivation and Context Some platforms exhibit intermittent failures with NaN results. Follow up to: #21525 Cc: @ranjitshs
Description
This change addresses a case where we multiply two matrices, and their inner dimension is 0.
numpy and Eigen which is being used in our CPU EP implementation correctly handle this case
and output a [M, N] matrix filled with zeros.
Motivation and Context
This is required to support GenAI empty input Lora implementation.
Addresses: #21483