-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cranelift: Optimize __multi3-style multiplications #8653
base: main
Are you sure you want to change the base?
Conversation
LLVM's `__multi3` function works by splitting a wide multiplication into several narrower ones. This optimization recognizes the algebraic identities involved and merges them back into the original wide multiply. This is not yet done but illustrates how part of the optimization can work, at least. Currently, the lower half of the result is optimized into a single `imul` instruction, but most of the intermediate values that are optimized away there are still used in computing the upper half, so elaboration brings them back later. Fixes bytecodealliance#4077
Subscribe to Label Action
This issue or pull request has been labeled: "cranelift", "isle"
Thus the following users have been cc'd because of the following labels:
To subscribe or unsubscribe from this label, edit the |
I don't know if it'd help at all, but cc https://github.com/bytecodealliance/wasmtime/pull/7719/files#diff-2041f67049d5ac3d8f62ea91d3cb45cdb8608d5f5cdab988731ae2addf90ef01 which will convert larger |
Yeah, this definitely felt similar to that, so I placed my new rules next to those Thinking about this comment, though, led me to realize that once we can do that, ISLE is capable enough today to transform a sequence like this:
into something like this:
If our mid-end rules supported producing instructions with multiple results, the On x86 at least, an |
It might not be an improvement by itself, but if we recognize some other i128 ops we can probably start applying some optimization rules at the i128 level which would be really neat. For example, we could now recognize |
Good point! Except we currently can't deal with 128-bit constants, so rules like The other thing we can't currently do is notice that this sequence has a redundant multiply:
|
Since there's no 128-bit |
Ooh, I hadn't looked at those carefully. That's awesome that |
For extend we don't need to wait on wasmtime/cranelift/codegen/src/prelude_opt.isle Lines 113 to 116 in f1fe2af
which is a special extractor added in #7710 for exactly the "there might be an (Though I'd love to have or patterns to stop needing custom extractors for things like this!) EDIT: landed in #8686 |
LLVM's
__multi3
function works by splitting a wide multiplication into several narrower ones. This optimization recognizes the algebraic identities involved and merges them back into the original wide multiply.This is not yet done but illustrates how part of the optimization can work, at least.
Currently, the lower half of the result is optimized into a single
imul
instruction, but most of the intermediate values that are optimized away there are still used in computing the upper half, so elaboration brings them back later.Fixes #4077