Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Arc] Add VectorizeOp canonicalization #7146

Merged
merged 1 commit into from
Jul 2, 2024

Conversation

elhewaty
Copy link
Member

@elhewaty elhewaty commented Jun 8, 2024

No description provided.

@elhewaty elhewaty marked this pull request as draft June 8, 2024 16:38
@elhewaty elhewaty force-pushed the VectorizeOp-canonicalizer branch 2 times, most recently from 498daf9 to 3060013 Compare June 22, 2024 08:45
@elhewaty elhewaty marked this pull request as ready for review June 22, 2024 08:46
@elhewaty elhewaty requested a review from TaoBi22 June 22, 2024 08:46
Copy link
Member

@maerhart maerhart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great job! So cool that this is working now.

test/Dialect/Arc/arc-canonicalizer.mlir Outdated Show resolved Hide resolved
lib/Dialect/Arc/Transforms/ArcCanonicalizer.cpp Outdated Show resolved Hide resolved
lib/Dialect/Arc/Transforms/ArcCanonicalizer.cpp Outdated Show resolved Hide resolved
Comment on lines +629 to +628
for (auto &op : otherBlock.without_terminator())
rewriter.clone(op, argMapping);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using rewriter.inlineBlockBefore would avoid cloning operations since the original ops are removed anyways.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@maerhart, I think using rewriter.inlineBlockBefore will make the code harder(just a thought):

if the source block is inserted somewhere in the middle (or beginning) of the dest block, the source block must have no successors. Otherwise, the resulting IR would have unreachable operations. [1]

here's what I understand(may be wrong), if I want to inline the otherVecOp block at the beginning of vecOp region then otherVecOp should have no uses, and otherVecOp has uses as its output is used as input in vecOp operands. so before inlining, I think I will need to maintain the index of the operands to be removed like we did with Arguments. I am not saying it's impossible but more work I think?

[1] MLIR docs

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The body blocks don't have any successors since they are the only blocks in the region.
You'd need to modify the block arguments of the otherVecOp instead of the current block and pass the adequate values as the last argument of inlineBlockBefore. I don't think the code will get more complex by doing so, but it requires a few changes to your code. I'm also fine with leaving it as is for now. We can come back and get rid of the unnecessary cloning if we observe performance issues.

lib/Dialect/Arc/Transforms/ArcCanonicalizer.cpp Outdated Show resolved Hide resolved
lib/Dialect/Arc/Transforms/ArcCanonicalizer.cpp Outdated Show resolved Hide resolved
test/Dialect/Arc/arc-canonicalizer.mlir Show resolved Hide resolved
@fabianschuiki
Copy link
Contributor

This is starting to look really nice! Any idea how well this works on the cores in the arc-tests repository? Are there any cases that don't work yet?

@elhewaty
Copy link
Member Author

This is starting to look really nice! Any idea how well this works on the cores in the arc-tests repository? Are there any cases that don't work yet?

It works well on the dual rocket core we vectorized before, but just a few merges!!
The problem here is that if we allow ClockTypes it crashes when at otherArg.getType() but after some work I noticed the problem is not that ClockTypes crashes but they may with some args that will make it crash as if we allow all args to be of the same type there's no crash happens!! so what can make otherArg.getType crash?!! that is the problem I can allow Clocks, Ints, and floats this may work with the tests but fail in some other cases.

@elhewaty elhewaty requested a review from maerhart July 1, 2024 17:38
Copy link
Contributor

@fabianschuiki fabianschuiki left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me! Nice job getting this up and running 🙂.

I liked @maerhart's suggestion of having a test where one vectorize op feeds into another, but with the operands shuffled, such that the pattern should not apply. That might be worth adding before merging.

@elhewaty
Copy link
Member Author

elhewaty commented Jul 2, 2024

I liked @maerhart's suggestion of having a test where one vectorize op feeds into another, but with the operands shuffled, such that the pattern should not apply. That might be worth adding before merging.

I think there is a test for this, there is a function called Needs_Shuffle at the end of the tests? Or do you mean something else?

@fabianschuiki
Copy link
Contributor

Oh you're totally right, I overlooked that. Very nice 😃

Copy link
Member

@maerhart maerhart left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice work! 🎉

// CHECK-NEXT: [[ADD:%.+]] = comb.add %arg0, %arg1 : i8
// CHECK-NEXT: arc.vectorize.return [[ADD]] : i8
// CHECK-NEXT: }
// CHECK-NEXT: [[VEC1:%.+]]:4 = arc.vectorize ([[VEC0]]#0, [[VEC0]]#1, [[VEC0]]#2, [[VEC0]]#3), (%n, %p, %r, %t), ([[VEC0]]#0, [[VEC0]]#1, [[VEC0]]#2, [[VEC0]]#3) : (i8, i8, i8, i8, i8, i8, i8, i8, i8, i8, i8, i8) -> (i8, i8, i8, i8) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be another optimization pattern: if a vector is given multiple times as input to the vectorize op, we can remove one of them and replace usages with the other occurrence.
But that's something for another PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought of this too. I will

arc.output %arg0 : i3
}

hw.module private @Self_Use(in %clock : !seq.clock, in %reset : i1, in %io_in_a_ready : i1, in %io_in_a_valid : i1, in %io_in_a_bits_opcode : i3, in %io_in_a_bits_param : i3, in %io_in_a_bits_size : i3, in %io_in_a_bits_source : i3, in %io_in_a_bits_address : i17, in %io_in_a_bits_mask : i8, in %io_in_a_bits_corrupt : i1, in %io_in_d_ready : i1, in %io_in_d_valid : i1, in %io_in_d_bits_size : i3, in %io_in_d_bits_source : i3) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: there are many unused input ports, we could trim them a bit.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done! Please merge if it's ok

@fabianschuiki fabianschuiki merged commit dbb07f3 into llvm:main Jul 2, 2024
4 checks passed
@elhewaty elhewaty deleted the VectorizeOp-canonicalizer branch July 3, 2024 02:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants