-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BACKEND] Support convert_layout
with num_ctas > 1
Using Linear Layout
#4782
Conversation
pin @lezcano because he expressed interests in reviewing LL related PRs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks!
@@ -926,6 +937,7 @@ std::optional<LinearLayout> chooseStMatrixLayoutNoLeadingOffset( | |||
StringAttr kWarp = S("warp"); | |||
StringAttr kCol = S("dim1"); | |||
StringAttr kRow = S("dim0"); | |||
StringAttr kBlock = S("block"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
bikeshedding: I hope the name is explicit enough. Block term is way overloaded unfortunately. We could name it CTA but that's very nvidia specific.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, I had a try to change it to CTA but found it a bit annoying to change a lot of places. Let me sort it out in the future.
…ayout (triton-lang#4782) Particularly, this PR implements layout conversion when a CGA contains more than one CTA. In such cases, a Triton tensor is split into multiple blocks, with each block being handled by a CTA. ``` block0 | block1 ---------------- block2 | block3 ``` If data transfer is required from block0 to block3, this PR cannot handle it, and we use `isCrossCTAConversion` to check this condition.
Particularly, this PR implements layout conversion when a CGA contains more than one CTA. In such cases, a Triton tensor is split into multiple blocks, with each block being handled by a CTA.
If data transfer is required from block0 to block3, this PR cannot handle it, and we use
isCrossCTAConversion
to check this condition.