-
Notifications
You must be signed in to change notification settings - Fork 178
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add loop unrolling for matmul #11
Add loop unrolling for matmul #11
Conversation
Wow if that holds up the performance jump is a lot. Side note - any idea why your m2 pro underperforming my m1 macbook pro for 1D tiling without unrolling (I get ~ 1.3 tflops)? On one hand, if these performance improvements hold up it's probably worth it. On the other hand:
My suggestion for now is:
|
Thank you for your reviewing!
I also got 1.3 tflops, but it was due to rounding the timer precision to seconds.
It should be a trade-off between instruction cache misses and pipeline stall.
Since the AST is not checked, it is difficult to check failure conditions. |
c209132
to
0ae5bed
Compare
experimental/wgsl.h
Outdated
// Loop-unrolling optimization with regex | ||
// | ||
// Note: Be cautious, as it does not correctly recognize comments or lexical tokens. | ||
std::string loopUnrolling(const std::string& code) { | ||
// This regex pattern matches a for loop with the following structure: | ||
// for (var <varName>: u32 = <start>; <varName> < <end>; <varName>++) { <loopBody> } | ||
std::regex forLoopPattern(R"(for\s*\(\s*var\s+(\w+):\s*u32\s*=\s*(\d+)\s*;\s*\1\s*<\s*(\d+)\s*;\s*\1\+\+\s*\)\s*\{\s*([^{}]*)\})"); | ||
// Explanation of the regex: | ||
// for\s*\( : Matches 'for (' with optional whitespace | ||
// \s*var\s+ : Matches 'var ' with optional whitespace | ||
// (\w+) : Captures the variable name (alphanumeric characters and underscores) | ||
// :\s*u32\s*=\s* : Matches ': u32 = ' with optional whitespace | ||
// (\d+) : Captures the start value (one or more digits) | ||
// \s*;\s* : Matches ';' with optional whitespace | ||
// \1\s*<\s* : Matches the captured variable name followed by '<' with optional whitespace | ||
// (\d+) : Captures the end value (one or more digits) | ||
// \s*;\s* : Matches ';' with optional whitespace | ||
// \1\+\+\s* : Matches the captured variable name followed by '++' with optional whitespace | ||
// \)\s*\{\s* : Matches ')' followed by '{' with optional whitespace | ||
// ([^{}]*) : Captures the loop body (anything except '{' or '}') | ||
// \} : Matches the closing '}' | ||
|
||
// Example: | ||
// | ||
// Input code: | ||
// for (var i: u32 = 0; i < 3; i++) { std::cout << i << std::endl; } | ||
// | ||
// Matches: | ||
// varName = "i" | ||
// start = "0" | ||
// end = "3" | ||
// loopBody = "std::cout << i << std::endl;" | ||
// | ||
// Unrolled: | ||
// std::cout << 0 << std::endl; | ||
// std::cout << 1 << std::endl; | ||
// std::cout << 2 << std::endl; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add some comments for the regex.
Thanks for the updates. Will go ahead and merge this. I might further move some things around after merging to keep Great idea, clearly this makes a big difference. If you have other ideas feel free to make a PR (BTW one of my TODOs after the release tasks are taken care of is to implement some autotuning functionality). |
WGSL of dawn/tint does not have the optimization of loop unrolling.
So this PR implements loop unrolling for matmul to check the performance of it.
In my environment(m2 pro), the result is as follows: