Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TTIG-TO-LLVM] Support row-vector broadcasts and make_range #2046

Merged
merged 11 commits into from
Sep 12, 2024

Conversation

jopperm
Copy link
Contributor

@jopperm jopperm commented Aug 29, 2024

Minimum-viable support for lowering broadcasts and tt.make_range ops on the advance path.

See #1947 for more context / the complete PoC.

@jopperm jopperm self-assigned this Aug 29, 2024
@jopperm jopperm changed the title [[TTIG-TO-LLVM]] Support row-vector broadcasts and make_range [TTIG-TO-LLVM] Support row-vector broadcasts and make_range Aug 29, 2024
@@ -573,8 +579,36 @@ class BroadcastOpConversion
LogicalResult
matchAndRewrite(triton::BroadcastOp op, OpAdaptor adaptor,
ConversionPatternRewriter &rewriter) const override {
rewriter.replaceOp(op, adaptor.getSrc());
return success();
constexpr unsigned subgroupSize = 16;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would it be better to get subgroup size from module?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm querying the triton_intel_gpu.min_sg_size attribute now, is that the correct one?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would query triton::gpu::TritonGPUDialect::getThreadsPerWarp(mod), as the minimum one may not be the selected one, although it always is on the advanced path.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the advanced path, threads-per-warp is always 1 IIRC.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not at this stage, I can see that multiple patterns query getThreadsPerWarp in this file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right of course. Fixed.

// FIXME: The real lowering has to take the layout into account. Here, we're
// just emitting a sequence of ints. Use
// `third_party/intel/lib/TritonIntelGPUToLLVM/MakeRangeOpToLLVM.cpp`
// instead!
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typically, we have a separate conversion for triton ops, that's why this file stands.
what's meaning here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've rephrased the comment to a bit to explain the lowering to a sequence of ints is the correct lowering for the advanced path, assuming dense layouts there.

@jopperm jopperm merged commit 389f5dd into llvm-target Sep 12, 2024
4 checks passed
@jopperm jopperm deleted the jopperm/broadcast-makerange-to-llvm branch September 12, 2024 15:55
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[#6 Attention Performance] extend attention support for Causal = True
5 participants