Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[luci/pass] Introduce FuseMulWithFullyConnectedPass #13607

Merged
merged 14 commits into from
Aug 20, 2024

Conversation

jiwaszki
Copy link
Contributor

@jiwaszki jiwaszki commented Aug 7, 2024

This commit introduce FuseMulWithFullyConnectedPass which will fuse Mul to previous FullyConnected if possible.

ONE-DCO-1.0-Signed-off-by: Jan Iwaszkiewicz [email protected]

This commit introduce FuseMulWithFullyConnectedPass which will fuse Mul to previous FullyConnected if possible.

ONE-DCO-1.0-Signed-off-by: Jan Iwaszkiewicz <[email protected]>
@jiwaszki
Copy link
Contributor Author

jiwaszki commented Aug 7, 2024

@seanshpark
Copy link
Contributor

@jiwaszki , can you reopen the draft PR and update it so that current full changes are passing the test?
with only this PR, it's not possible to know that end-to-end pass value test is working OK or not.

return ((node->rank() == 1 || node->rank() == 0) && node->size<loco::DataType::FLOAT32>() == 1);
}

inline void update_with_scalar(luci::CircleConst *fused_node,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto

Comment on lines 122 to 127
// Get Mul node:
auto fc_output = loco::succs(fc);
// Make sure that FullyConnected has only one child:
RETURN_FALSE_UNLESS(fc_output.size() == 1);
auto mul = dynamic_cast<luci::CircleMul *>(*fc_output.begin());
RETURN_FALSE_UNLESS(mul);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We usually go upward(to input).
For this case, begin with Mul and check one of the input is FC.
Can you please revise in this check order?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When you traverse from Mul to FC, FC input of Mul should be created and replaced to new FC as exiting FC may have multiple successors.

Current check only works when there exist single connection of FC-Mul.

Copy link
Contributor Author

@jiwaszki jiwaszki Aug 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see approach similar to mine on other pass which base on FC as well (with Add instead of Mul):

bool fuse_add_with_fc(luci::CircleFullyConnected *fc)

// Get add node
auto fc_output = loco::succs(fc);
if (fc_output.size() != 1)
return false;

I want to make sure I understand the motivation here. Here is my understanding of a graph and the approach to pass you described:
image

In case of such pattern of single connection of FC-Mul, I understand the fusion of Mul into FC to eliminate node and reduce overall number of operations. However, if FC has more operations connected to it's output, the pass is simply replacing one op for another -- which from "math view" is resulting in the same count of operations.

Is that a matter of existing optimizations in targeted HW and/or more performant kernel implementations in ONE that FC is preferred here in place of Mul?

EDIT: I would understand a case where we look for Mul and check that connected FC has only one child and it's Mul itself. I wonder if that approach would work better as probably there are less Mul nodes to perform a check on than FC nodes (at least in popular models). I would need to have a better understanding of ONE method to applying passes to judge that. @seanshpark is that something that can be considered instead of adding multiple FC?

Copy link
Contributor

@seanshpark seanshpark Aug 13, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that a matter of existing optimizations in targeted HW and/or more performant kernel implementations in ONE that FC is preferred here in place of Mul?

No. I don't think so.

is that something that can be considered instead of adding multiple FC?

I cannot understand your question point.

Copy link
Contributor

@jinevening jinevening Aug 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see approach similar to mine on other pass which base on FC as well (with Add instead of Mul):

FuseAddWithFullyConnected fuses add with FC only when FC has a single successor (add). That is an intended behavior. If FC has multiple successors, FC layers are duplicated (as @jiwaszki said), which tends to increase model size and degrade model performance. I think FuseMulWithFullyConnected can follow the same strategy with FuseAddWithFullyConnected.

So, +1 for the current implementation.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I DISAGREE. I DO NOT WANT MULTIPLE WAYS OF SEARCHING.
SOME DAY FuseAddWithFullyConnected SHOULD BE FIXED TOO.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@seanshpark How about the changes in 79a2213 ? It merges both ideas into one.
First, it follows the bottom-up approach, starting from mul:

bool fuse_mul_with_fc(luci::CircleMul *mul)

// Check if any FC node connects to Mul.
// Find the pattern of Mul(FC, CircleConst):
luci::CircleFullyConnected *fc = nullptr;
luci::CircleConst *multiplication = nullptr;
RETURN_FALSE_UNLESS(luci::fill(&fc, &multiplication).with_commutative_args_of(mul));

Second, still has the intended behavior of fusing only when there is only one successor to fc:

// Make sure that FullyConnected has only one successor:
RETURN_FALSE_UNLESS(loco::succs(fc).size() == 1);

@jinevening please also review the new commit when you have time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good to me. Please address @seanshpark 's comments.

jinevening
jinevening previously approved these changes Aug 14, 2024
Copy link
Contributor

@jinevening jinevening left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM


switch (mul->dtype())
{
case loco::DataType::FLOAT32:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dtype check is duplicate with fuse_mul_with_fc().

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Q) why use switch case with dtype here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's in fact redundant check, leftover from refactoring. Removed and further simplified.

Copy link
Contributor

@seanshpark seanshpark left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM thank you!

@seanshpark
Copy link
Contributor

seanshpark commented Aug 19, 2024

For future readers for reference;

Q) Do we need to add RETURN_FALSE_UNLESS(loco::succs(fc).size() == 1); ?
A) No. This is optional and IMHO can benefit network connection case-by-case. Adding this can produce better network in performance like this PR but this is not confirmed for all cases. As our current implementation didn't had this line for most passes and no particular issue was raised.

Why am I leaving this comment?
Current implementation is like "we add conditions what is necessary and options are added when it is needed, that is when real models appear so that we can check and test that real conditions. what we can imagine with corner cases is OK but that imagination can grow without bound and also to the implementation."

I do not want to add "loco::succs(fc).size() == 1" to all existing implementations and future ones without real models, and without proved benefit with limited viewpoint.

@jiwaszki
Copy link
Contributor Author

@jinevening pinging you just for formal approval. Thanks in advance!

Copy link
Contributor

@jinevening jinevening left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@jinevening jinevening merged commit 7c60326 into Samsung:master Aug 20, 2024
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants