-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Feature Request] Move graph compilation behind higher transformers (graph optimization) #20915
Comments
I believe L2 and L3 were intended for internal ORT EP specific optimizations. Because they're EP specific they cannot run before partitioning. Unfortunately the separation is polluted a little by allowing some additional EPs with static kernels to use L2 and L3 optimizers. That has historical reasons, it would be better if we abstracted out the more re-usable optimization logic so any EP could leverage that during partitioning, but it's how it is currently. In theory you could delay the Compile until later, but that just encourages further pollution of core code with lists of EPs that the L2/L3 optimizations apply to, along with having to maintain a whole lot of state information from partitioning until after those optimizations in order to call Compile. The existing lists are already somewhat out of control. That said, an EP is free to implement its own optimizations during partitioning.
|
Thanks for your response. |
Thanks again for your answers. But I ran into some problems while implementing our own optimizations during partitioning. Since And, this made me to think further. We should note that the sense of putting partitioning before L2/L3 optimizations is to let each EP performs graph optimization within the supported subgraphs, and the sense of graph compilation is to let each compiling EP be able to compile the static subgraph into an execution unit. Based on this analysis, I think I have enough reasons to separate partitioning and compilation and to delay the compilation until the highest optimizations are complete, which can make the whole process clearer. |
An optimizer generally has 2 parts. First the logic to determine if a set of nodes can be modified which can be done with a GraphViewer. Existing optimizer logic could be factored to split this out for re-use. Second is the logic to make the modifications. In IExecutionProvider::Compile you should not be thinking about modifying the original graph at all. ORT is going to delete all of the nodes being compiled (which are provided via the The role of Compile is to set the function pointers in |
I completely agree with you. Any optimizer should not be applied at the compilation stage. We should give the compiling EPs a chance to do further optimization, so separation of partitioning and compilation is necessary. So I wonder if ORT can give the second chance to compiling EPs to compile the graph after L2/L3 optimizations and let each EP decide for itself where to compile. I have made some attempts in my own workspace peishenyan@5220a19 In Not only WebNN EP, I believe that all compiling EPs can benefit from this structure if they can decompose their original logic into partitioning and compiling. |
Hi @skottmckay , splitting the partitioning and compilation is the most general solution that I can think of for this compiling EP's problem. I wonder what your opinion is on my experimental code peishenyan@5220a19 or do you have any other suggestion? |
Sorry not quite following. I wasn't suggesting an EP shouldn't do optimizations in the compilation phase. If anything, that's the place I would expect optimizations to happen as the EP is free to do whatever it wants in Compile. Level 2 and 3 optimizers use internal ORT operators and not ONNX operators. We don't expect external EPs to be taking a dependency on our internal operators in general as they're not defined in a public specification like ONNX operators are. Giving an EP direct access to the graph also means it can break it. When it breaks, it's not unreasonable for someone to think ORT is to blame given it owns the graph and to create a GitHub Issue we need to deal with. If I understand correctly
It's not clear why you REQUIRE the ability to modify the graph directly in the EP's Compile, given that as soon as Compile completes ORT will delete all the nodes that were in the subgraph being compiled. Is the real problem to solve that you want to somehow re-use existing L2 or L3 optimizer logic and we can separate that from any requirement to directly modify the graph? Maybe a concrete example would help. What is a specific optimization you want to re-use during Compile? |
This is what I want to express. To be honest, my most immediate goal is to enable Gelu and LayerNorm fusion for WebNN EP. Gelu and LayerNorm are two ONNX operators whose fusion is implemented in L2 optimization. We want to reuse the operator fusion optimizations like these two. BTW, why do GeluFusion and LayerNormFusion appear in L2 optimization instead of L1? 😂 Gelu and LayerNorm are ONNX operators |
@fdwr, do you have any insightful perspectives on this issue? |
Gelu and LayerNorm were only relatively recently added as ONNX operators. Now that they are, I would expect the optimizers that add them could move to L1. |
Sounds good! I think I can help to add them to L1 if needed.😸 |
If you want to put together a PR to move them to L1 that would be great. One caveat. Applying the fusion in L1 will need to be conditional on the ONNX opset the model imports to make sure the ONNX operator is valid. LayerNorm requires opset 16+ and Gelu requires opset 20+. Due to that they may also need to run in L2 for older models as the internal contrib op doesn't have a requirement for any specific ONNX opset. |
@skottmckay Thanks. I'm working on adding the Gelu and LayerNorm fusion adaptively to the L1/L2 optimization. One problem is that we may have two ways:
This is so confusing.😢 |
According to #20915, we move the Gelu and LayerNorm fusion to L1 with a condition on the ONNX opset the model imports (LayerNorm requires opset 16+ and Gelu requires opset 20+.) If the opset version doesn't meet the requirements, the fusion is delayed to L2 optimization since the internal contrib op doesn't have a requirement for any specific ONNX opset. --------- Co-authored-by: Scott McKay <[email protected]> Co-authored-by: Edward Chen <[email protected]>
Solved by #21332. |
Describe the feature request
Hi developers, During my exploration for supporting L2 graph optimization for WebNN EP, the logic in ORT confuses me.
In
inference_session.cc
line 1233-1243, I find thatpartitioner.Partition()
function is called before level 2 and level 3 optimization. However, some execution providers such as NNAPI and WebNN will doep->compile()
inpartitioner.Partition()
, which builds the supported graph into a node and loses the opportunity to perform further graph optimization. (Some other EPs, which only register per op instead of a graph/subgraph, do not need to do compile).So I wonder, logically we should check the capability of each EP and partition the graph for different EPs after making all changes to the graph, but why here do partition first and do high level transforms later? Can we re-order this?
For DML EP, in order to achieve L2/L3 level graph optimization and preserve graph compilation, they compromised by disguising the whole graph compilation as L3 level optimization, called
DmlGraphFusionTransformer
. This method takes a lot of efforts and it leads me to believe that the above problem is a general one rather than WebNN specific.Describe scenario use case
This will benefit all EPs that need to compile the supported graph/subgraph into a node and achieve high-level graph optimizations.
The text was updated successfully, but these errors were encountered: