-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
simd: implement missing intrinsics from simd/generic-arithmetic-pass.rs #382
simd: implement missing intrinsics from simd/generic-arithmetic-pass.rs #382
Conversation
1ae81e9
to
1d88c5b
Compare
I'll do the review soon, hopefully in the next few days. |
No worries, take all the time you need. |
e4b5828
to
bae699f
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few nitpicks.
src/intrinsic/simd.rs
Outdated
let shuffled = hi | lo; | ||
let cast_ty = bx.context.new_vector_type(elem_type, byte_vector_type_size / (elem_size_bytes as u64)); | ||
let loaded = bx.context.new_bitcast(None, shuffled, cast_ty); | ||
let elems: Vec<_> = (0..in_len) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you extract the individual elements instead of casting because you want to truncate the vector?
Please add a comment to explain.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, the intention here is to truncate the vector. Is there a more idiomatic way to do this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added some more documentation for how this particular algorithm is supposed to work in 8d42a82.
bae699f
to
cee07f5
Compare
Apologies on the delay on getting back to your review - last week's holidays made things busy for me, and I couldn't get back to this until now. |
Implements lane-local byte swapping through vector shuffles. While this is more setup than non-vector shuffles, this implementation can shuffle multiple integers concurrently. Signed-off-by: Andy Sadler <[email protected]>
Signed-off-by: Andy Sadler <[email protected]>
The simd intrinsic handler was delegating implementation of `simd_frem` to `Builder::frem`, which wasn't able to handle vector-typed inputs. To fix this, teach this method how to handle vector inputs. Signed-off-by: Andy Sadler <[email protected]>
If we're running against a patched libgccjit, use an algorithm similar to what LLVM uses for this intrinsic. Otherwise, fallback to a per-element bitreverse. Signed-off-by: Andy Sadler <[email protected]>
Signed-off-by: Andy Sadler <[email protected]>
gcc_not would panic upon encountering a vector type, which is not what we want here. Signed-off-by: Andy Sadler <[email protected]>
This test now passes when tested with a patched libgccjit. However, due to [some compiler bugs][1], we can't enable this for non-patched libgccjit yet. [1]: https://github.com/sadlerap/rustc_codegen_gcc/actions/runs/6820180639/job/18548672444#step:15:4375 Signed-off-by: Andy Sadler <[email protected]>
cee07f5
to
17b2c46
Compare
Thanks a lot for your contribution! |
This implements/fixes the following intrinsics:
simd_bswap
simd_frem
simd_bitreverse
simd_ctlz
simd_cttz
simd_neg
These were the remaining intrinsics from the
simd/generic-arithmetic-pass.rs
ui test, which should now pass for a patched libgccjit. However, it seems not to compile against a non-patched libgccjit, so leave it disabled there for now.