-
Notifications
You must be signed in to change notification settings - Fork 157
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[luci/pass] Introduce FuseMulWithFullyConnectedPass #13607
[luci/pass] Introduce FuseMulWithFullyConnectedPass #13607
Conversation
This commit introduce FuseMulWithFullyConnectedPass which will fuse Mul to previous FullyConnected if possible. ONE-DCO-1.0-Signed-off-by: Jan Iwaszkiewicz <[email protected]>
@jiwaszki , can you reopen the draft PR and update it so that current full changes are passing the test? |
return ((node->rank() == 1 || node->rank() == 0) && node->size<loco::DataType::FLOAT32>() == 1); | ||
} | ||
|
||
inline void update_with_scalar(luci::CircleConst *fused_node, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ditto
// Get Mul node: | ||
auto fc_output = loco::succs(fc); | ||
// Make sure that FullyConnected has only one child: | ||
RETURN_FALSE_UNLESS(fc_output.size() == 1); | ||
auto mul = dynamic_cast<luci::CircleMul *>(*fc_output.begin()); | ||
RETURN_FALSE_UNLESS(mul); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We usually go upward(to input).
For this case, begin with Mul
and check one of the input is FC
.
Can you please revise in this check order?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When you traverse from Mul
to FC
, FC
input of Mul
should be created and replaced to new FC
as exiting FC
may have multiple successors.
Current check only works when there exist single connection of FC
-Mul
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see approach similar to mine on other pass which base on FC
as well (with Add
instead of Mul
):
bool fuse_add_with_fc(luci::CircleFullyConnected *fc) |
ONE/compiler/luci/pass/src/FuseAddWithFullyConnectedPass.cpp
Lines 58 to 61 in 719a6f7
// Get add node | |
auto fc_output = loco::succs(fc); | |
if (fc_output.size() != 1) | |
return false; |
I want to make sure I understand the motivation here. Here is my understanding of a graph and the approach to pass you described:
In case of such pattern of single connection of FC-Mul
, I understand the fusion of Mul
into FC
to eliminate node and reduce overall number of operations. However, if FC
has more operations connected to it's output, the pass is simply replacing one op for another -- which from "math view" is resulting in the same count of operations.
Is that a matter of existing optimizations in targeted HW and/or more performant kernel implementations in ONE that FC
is preferred here in place of Mul
?
EDIT: I would understand a case where we look for Mul
and check that connected FC
has only one child and it's Mul
itself. I wonder if that approach would work better as probably there are less Mul
nodes to perform a check on than FC
nodes (at least in popular models). I would need to have a better understanding of ONE method to applying passes to judge that. @seanshpark is that something that can be considered instead of adding multiple FC
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is that a matter of existing optimizations in targeted HW and/or more performant kernel implementations in ONE that FC is preferred here in place of Mul?
No. I don't think so.
is that something that can be considered instead of adding multiple FC?
I cannot understand your question point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see approach similar to mine on other pass which base on FC as well (with Add instead of Mul):
FuseAddWithFullyConnected
fuses add with FC only when FC has a single successor (add). That is an intended behavior. If FC has multiple successors, FC layers are duplicated (as @jiwaszki said), which tends to increase model size and degrade model performance. I think FuseMulWithFullyConnected
can follow the same strategy with FuseAddWithFullyConnected
.
So, +1 for the current implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I DISAGREE. I DO NOT WANT MULTIPLE WAYS OF SEARCHING.
SOME DAY FuseAddWithFullyConnected SHOULD BE FIXED TOO.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@seanshpark How about the changes in 79a2213 ? It merges both ideas into one.
First, it follows the bottom-up approach, starting from mul
:
bool fuse_mul_with_fc(luci::CircleMul *mul) |
ONE/compiler/luci/pass/src/FuseMulWithFullyConnectedPass.cpp
Lines 116 to 120 in 79a2213
// Check if any FC node connects to Mul. | |
// Find the pattern of Mul(FC, CircleConst): | |
luci::CircleFullyConnected *fc = nullptr; | |
luci::CircleConst *multiplication = nullptr; | |
RETURN_FALSE_UNLESS(luci::fill(&fc, &multiplication).with_commutative_args_of(mul)); |
Second, still has the intended behavior of fusing only when there is only one successor to fc
:
ONE/compiler/luci/pass/src/FuseMulWithFullyConnectedPass.cpp
Lines 121 to 122 in 79a2213
// Make sure that FullyConnected has only one successor: | |
RETURN_FALSE_UNLESS(loco::succs(fc).size() == 1); |
@jinevening please also review the new commit when you have time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks good to me. Please address @seanshpark 's comments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
|
||
switch (mul->dtype()) | ||
{ | ||
case loco::DataType::FLOAT32: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
dtype
check is duplicate with fuse_mul_with_fc()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Q) why use switch case with dtype here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's in fact redundant check, leftover from refactoring. Removed and further simplified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM thank you!
For future readers for reference; Q) Do we need to add Why am I leaving this comment? I do not want to add "loco::succs(fc).size() == 1" to all existing implementations and future ones without real models, and without proved benefit with limited viewpoint. |
@jinevening pinging you just for formal approval. Thanks in advance! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This commit introduce FuseMulWithFullyConnectedPass which will fuse Mul to previous FullyConnected if possible.
ONE-DCO-1.0-Signed-off-by: Jan Iwaszkiewicz [email protected]