-
Notifications
You must be signed in to change notification settings - Fork 289
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Arc] Add VectorizeOp canonicalization #7146
Conversation
498daf9
to
3060013
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great job! So cool that this is working now.
for (auto &op : otherBlock.without_terminator()) | ||
rewriter.clone(op, argMapping); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using rewriter.inlineBlockBefore
would avoid cloning operations since the original ops are removed anyways.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@maerhart, I think using rewriter.inlineBlockBefore
will make the code harder(just a thought):
if the source block is inserted somewhere in the middle (or beginning) of the dest block, the source block must have no successors. Otherwise, the resulting IR would have unreachable operations. [1]
here's what I understand(may be wrong), if I want to inline the otherVecOp
block at the beginning of vecOp
region then otherVecOp
should have no uses, and otherVecOp
has uses as its output is used as input in vecOp
operands. so before inlining, I think I will need to maintain the index of the operands to be removed like we did with Arguments
. I am not saying it's impossible but more work I think?
[1] MLIR docs
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The body blocks don't have any successors since they are the only blocks in the region.
You'd need to modify the block arguments of the otherVecOp
instead of the current block and pass the adequate values as the last argument of inlineBlockBefore
. I don't think the code will get more complex by doing so, but it requires a few changes to your code. I'm also fine with leaving it as is for now. We can come back and get rid of the unnecessary cloning if we observe performance issues.
This is starting to look really nice! Any idea how well this works on the cores in the arc-tests repository? Are there any cases that don't work yet? |
It works well on the dual rocket core we vectorized before, but just a few merges!! |
3060013
to
a852579
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! Nice job getting this up and running 🙂.
I liked @maerhart's suggestion of having a test where one vectorize op feeds into another, but with the operands shuffled, such that the pattern should not apply. That might be worth adding before merging.
I think there is a test for this, there is a function called |
Oh you're totally right, I overlooked that. Very nice 😃 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice work! 🎉
// CHECK-NEXT: [[ADD:%.+]] = comb.add %arg0, %arg1 : i8 | ||
// CHECK-NEXT: arc.vectorize.return [[ADD]] : i8 | ||
// CHECK-NEXT: } | ||
// CHECK-NEXT: [[VEC1:%.+]]:4 = arc.vectorize ([[VEC0]]#0, [[VEC0]]#1, [[VEC0]]#2, [[VEC0]]#3), (%n, %p, %r, %t), ([[VEC0]]#0, [[VEC0]]#1, [[VEC0]]#2, [[VEC0]]#3) : (i8, i8, i8, i8, i8, i8, i8, i8, i8, i8, i8, i8) -> (i8, i8, i8, i8) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be another optimization pattern: if a vector is given multiple times as input to the vectorize op, we can remove one of them and replace usages with the other occurrence.
But that's something for another PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought of this too. I will
arc.output %arg0 : i3 | ||
} | ||
|
||
hw.module private @Self_Use(in %clock : !seq.clock, in %reset : i1, in %io_in_a_ready : i1, in %io_in_a_valid : i1, in %io_in_a_bits_opcode : i3, in %io_in_a_bits_param : i3, in %io_in_a_bits_size : i3, in %io_in_a_bits_source : i3, in %io_in_a_bits_address : i17, in %io_in_a_bits_mask : i8, in %io_in_a_bits_corrupt : i1, in %io_in_d_ready : i1, in %io_in_d_valid : i1, in %io_in_d_bits_size : i3, in %io_in_d_bits_source : i3) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: there are many unused input ports, we could trim them a bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done! Please merge if it's ok
a852579
to
03ce379
Compare
No description provided.