-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[TTIG-TO-LLVM] Support row-vector broadcasts and make_range #2046
Conversation
…nce path Signed-off-by: Julian Oppermann <[email protected]>
@@ -573,8 +579,36 @@ class BroadcastOpConversion | |||
LogicalResult | |||
matchAndRewrite(triton::BroadcastOp op, OpAdaptor adaptor, | |||
ConversionPatternRewriter &rewriter) const override { | |||
rewriter.replaceOp(op, adaptor.getSrc()); | |||
return success(); | |||
constexpr unsigned subgroupSize = 16; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would it be better to get subgroup size from module?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm querying the triton_intel_gpu.min_sg_size
attribute now, is that the correct one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would query triton::gpu::TritonGPUDialect::getThreadsPerWarp(mod)
, as the minimum one may not be the selected one, although it always is on the advanced path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the advanced path, threads-per-warp is always 1 IIRC.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not at this stage, I can see that multiple patterns query getThreadsPerWarp in this file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right of course. Fixed.
// FIXME: The real lowering has to take the layout into account. Here, we're | ||
// just emitting a sequence of ints. Use | ||
// `third_party/intel/lib/TritonIntelGPUToLLVM/MakeRangeOpToLLVM.cpp` | ||
// instead! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typically, we have a separate conversion for triton ops, that's why this file stands.
what's meaning here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've rephrased the comment to a bit to explain the lowering to a sequence of ints is the correct lowering for the advanced path, assuming dense layouts there.
Minimum-viable support for lowering broadcasts and
tt.make_range
ops on the advance path.See #1947 for more context / the complete PoC.