Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cmma major refactor #101

Merged
merged 72 commits into from
Sep 9, 2024
Merged

Cmma major refactor #101

merged 72 commits into from
Sep 9, 2024

Conversation

louisfd
Copy link
Member

@louisfd louisfd commented Sep 6, 2024

Cmma matmul is now much more flexible, and seemingly faster with some newly available configurations.

It seems there is still a bug if b_k > 32, but will work well at b_k = 16 [EDIT: solved. Bugs only at very large values, like 128x32, in f32, or 128x64 in f16, which is normal]

Fix #12
Fix #15 (after some research it seems like shared memory is the way to go. But at least now it's not of size B_M*B_N but rather of size size of tile * number of coops)

@louisfd louisfd mentioned this pull request Sep 6, 2024
@louisfd louisfd merged commit d90d529 into main Sep 9, 2024
4 of 7 checks passed
@louisfd louisfd deleted the feat/reuse_out_smem branch September 9, 2024 20:21
@nathanielsimard nathanielsimard restored the feat/reuse_out_smem branch September 9, 2024 21:40
@nathanielsimard nathanielsimard deleted the feat/reuse_out_smem branch September 9, 2024 21:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Matmul CMMA: use less shared memories Matmul CMMA: support other vectorizations (or none)
1 participant