-
Notifications
You must be signed in to change notification settings - Fork 289
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Seq] Add a new pass to sink clock_gate ops to the users #7474
base: main
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Awesome! This is a very useful transform 😍. Does the code also work if a module is instantiated multiple times? And if one of the instances has a clock gate in front and the other does not?
/// Combine two consecutive clock_gate ops, into one. | ||
void collapseConsecutiveClockGates(ClockGateOp driverGate, | ||
ClockGateOp userGate) { | ||
mlir::OpBuilder builder(context); | ||
builder.setInsertionPoint(userGate); | ||
Value enable = getEnable(driverGate); | ||
enable = builder.create<comb::AndOp>(userGate->getLoc(), enable, | ||
userGate.getEnable()); | ||
userGate.getInputMutable().assign(driverGate.getInput()); | ||
userGate.getEnableMutable().assign(enable); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very neat! Might also make sense as a canonicalization on ClockGateOp
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good idea, will add another PR to add this pattern.
/// If the clock_gate op has `test_enable`, combine it with the `enable` and | ||
/// return the final enable. | ||
Value getEnable(ClockGateOp clkGate) { | ||
Value enable = clkGate.getEnable(); | ||
// If there is testEnable, or it. | ||
if (auto testEnable = clkGate.getTestEnable()) { | ||
OpBuilder builder(clkGate.getContext()); | ||
builder.setInsertionPointAfter(clkGate); | ||
enable = | ||
builder.create<comb::OrOp>(clkGate->getLoc(), enable, testEnable); | ||
} | ||
return enable; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm wondering if we'd need to keep the regular enable
and the test_enable
separate… probably not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I asked David, and seems like its okay to combine them. Not sure of other use cases, if that's required then we need two additional ports for every gated clock.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah right, good point, forgot about that. I was just wondering of how a user might expect the pass to behave in CIRCT -- someone could expect that the enable inputs would be preserved as separate signals. But on the other hand this is also very easy to adjust once the pattern of the pass is established, and this comes up as a concrete need. Combining them feels like a good start!
auto clkGate = dyn_cast_or_null<ClockGateOp>(in.getDefiningOp()); | ||
if (!clkGate) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like a bit fragile since it could break analysis when there is wire between instance and clock.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you pointed out, this would require a fixed-point analysis to handle all the possible dataflow operations that can propagate clocks. I think such operations of passing a clock by a wire are very rare, and it might be fine to ignore them for this transformation. This transformation does not guarantee that all clock gates will be removed. I want to do some more experiments with this transformation enabled, and then based on the need, we can utilize a fixed-point analysis.
Yeah I think this transformation requires fixed-point analysis whose lattice is a pair of clock and enable. |
7768308
to
c7503c9
Compare
Thanks a lot @fabianschuiki and @uenoku for the feedback. Please take another look at it. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Cool! This is neat! 😁 🎉
Left some minor feedback.
c7503c9
to
28d5a67
Compare
void appendPorts( | ||
ArrayRef<std::tuple<StringAttr, Type, Location>> insertInputs, | ||
ArrayRef<std::pair<StringAttr, Value>> insertOutputs); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's difference from modifyPorts
which we are going to deprecate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, thanks for bringing it up, I will add comments.
Few reasons,
- Stop using the deprecated methods.
- Implement a simpler API, that only appends the input and then output ports, the most-common/only use case. Avoids the generic methods, that can insert/delete at random port indices.
- Avoid any performance foot-guns, the deprecated methods can have performance issues, implementing the fresh API that we discuss and properly inspect to ensure there are no obvious performance issues.
- At least rhodium can stop using the deprecated API, and switch to this, if its approved.
I will commit this separately, so if it makes sense to discuss this more broadly, I will create a new PR with this change.
What do you think ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for delayed replay. TBH I don't find much difference between deprecated functions. The intention for the deprecation was "it doesn't make sense to have an API which takes input and output ports separately" (
circt/lib/Dialect/HW/HWOps.cpp
Lines 675 to 679 in 5d8cf69
/// This is marked deprecated as it's only used from HandshakeToHW and | |
/// PortConverter and is likely broken and not currently tested. Users of this | |
/// are still written dealing with input and output ports separately, which is | |
/// an old and broken style. | |
[[deprecated]] static void |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your suggestions Hideto, I updated the interface, we need the output Values to update the OutputOp
also added the input values, that can be set as the block arguments are added. This avoids the need to know the argument index of the input args.
SmallVector<StringAttr> getInstanceArgNames(HWInstanceLike inst) { | ||
SmallVector<StringAttr> argNames; | ||
if (auto args = inst->getAttrOfType<ArrayAttr>("argNames")) { | ||
auto attr = args.getAsRange<StringAttr>(); | ||
argNames.insert(argNames.begin(), attr.begin(), attr.end()); | ||
} | ||
return argNames; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add argNames
getter to HWInstanceLike. I guess all operations that implement HWInstanceLike have non-optional argNames
if (auto compReg = dyn_cast<CompRegOp>(gatedOp)) { | ||
Value clk = compReg.getClk(); | ||
auto gate = dyn_cast_or_null<ClockGateOp>(clk.getDefiningOp()); | ||
if (!gate) | ||
continue; | ||
clk = gate.getInput(); | ||
Value enable = getEnable(gate); | ||
OpBuilder builder(context); | ||
builder.setInsertionPoint(compReg); | ||
auto compEn = builder.create<CompRegClockEnabledOp>( | ||
compReg->getLoc(), compReg.getInput(), clk, enable, | ||
compReg.getNameAttr(), compReg.getReset(), compReg.getResetValue(), | ||
compReg.getPowerOnValue(), compReg.getInnerSymAttr()); | ||
compReg->replaceAllUsesWith(compEn); | ||
opsToErase.insert(compReg); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should go to canonicalization if it's always beneficial.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I thought about that, but as you pointed out, I don't think this is "beneficial" in all cases. This should only happen if the user explicitly enables this transformation. So, the transformation of "sink clock-gates" implies trying to combine the clock gate's enable into the operation that is driven by the gated clock. This might not be desirable in general, and if we add it to the canonicalization we can't disable it in general.
But maybe this can be added to some op interface. A new op interface that any op can opt-in, and then we can invoke it from here.
My plan is to add the enable to FirRegOp next, similar transformation needs to happen to memory ops also, but its semantics are slightly non-trivial.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you think about decoupling canonicalization from this pass? I feel it looks more cleaner if SinkClockGates
only sinks clock gates and not perform comp reg simplification.
auto oldEnablePort = clockEnablePortIndices[index]; | ||
auto oldEnableVal = inst->getOperand(oldEnablePort); | ||
auto *def = oldEnableVal.getDefiningOp(); | ||
assert(llvm::isa_and_nonnull<ConstantOp>(def)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is it guaranteed to be constant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a constant, because this is another instance of a module that is already visited. When the first instance was visited, we added the enable port to all the module instances. The enable port for all the other instances were set to a constant true, except the one that is being visited, which is set to the clock gate op's enable. This ensures that we donot add multiple enable ports for the same clock gate port when visiting multiple instances of a module.
28d5a67
to
bdf3452
Compare
bdf3452
to
9d876f0
Compare
This is a new pass that sinks
seq.clock_gate
into its users and attempts to remove as manyseq.clock_gate
ops as possible.Any operation that can potentially encode the enable directly, will replace its gated clock with the un-gated clock and encode the clock gate enable as the operation's enable signal.
Currently only the
CompRegOp
operation is handled, but similar transformation needs to be attempted on other supported operations also.The first phase of the pass is a global transformation that continues to push the clock gates into the referenced modules, until none of the instance ops are driven by a gated clock.
Next, best effort is made to remove the gated clock and encode the enable directly into the operation that is being driven by the clock.