-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove useless NodeProto
serializations
#18791
Remove useless NodeProto
serializations
#18791
Conversation
Thanks for looking into this! Can you or somebody else tell me if there is any functional difference between calling |
The improvements seem more drastic when working on import onnx
import numpy as np
from spox import argument, build, Tensor, Var
from spox.opset.ai.onnx import v17 as op
from spox.opset.ai.onnx.ml.v3 import label_encoder
a = argument(Tensor(np.int64, ('N',)))
c = a
for x in range(100):
all_strings = list("random_string" + str(i) for i in range(100000))
all_ints = list(range(len(all_strings)))
c = op.if_(
op.const(True),
then_branch=lambda: [
label_encoder(
c,
keys_int64s=all_ints,
values_strings=all_strings
)
],
else_branch=lambda: [
label_encoder(
c,
keys_int64s=all_ints,
values_strings=all_strings
)
]
)[0]
c = label_encoder(c, keys_strings=all_strings, values_int64s=all_ints)
model: onnx.ModelProto = build(inputs={'a': a}, outputs={'c': c})
onnx.save(model, "big.onnx") On my machine the provided |
NodeProto
serializations
Also can someone elaborate if |
Sorry, for randomly tagging you, @snnn, but I saw your name in the git blame. Would you know the right person to review this? |
Am I right to assume that in this case |
/azp run Windows GPU TensorRT CI Pipeline, onnxruntime-binary-size-checks-ci-pipeline, onnxruntime-python-checks-ci-pipeline, orttraining-linux-ci-pipeline, orttraining-ortmodule-distributed, orttraining-linux-gpu-ci-pipeline, Linux QNN CI Pipeline, Windows ARM64 QNN CI Pipeline |
/azp run Linux CPU CI Pipeline, Linux CPU Minimal Build E2E CI Pipeline, Linux GPU CI Pipeline, Linux GPU TensorRT CI Pipeline, Linux Nuphar CI Pipeline, MacOS CI Pipeline, ONNX Runtime Web CI Pipeline, Windows CPU CI Pipeline, Windows GPU CI Pipeline, Linux OpenVINO CI Pipeline |
Azure Pipelines successfully started running 7 pipeline(s). |
Azure Pipelines successfully started running 9 pipeline(s). |
/azp run Windows x64 QNN CI Pipeline |
Azure Pipelines successfully started running 1 pipeline(s). |
TLDR: This change now consists of creating |
/azp run Windows GPU TensorRT CI Pipeline, onnxruntime-binary-size-checks-ci-pipeline, onnxruntime-python-checks-ci-pipeline, orttraining-linux-ci-pipeline, orttraining-ortmodule-distributed, orttraining-linux-gpu-ci-pipeline, Linux QNN CI Pipeline, Windows ARM64 QNN CI Pipeline |
/azp run Linux CPU CI Pipeline, Linux CPU Minimal Build E2E CI Pipeline, Linux GPU CI Pipeline, Linux GPU TensorRT CI Pipeline, Linux Nuphar CI Pipeline, MacOS CI Pipeline, ONNX Runtime Web CI Pipeline, Windows CPU CI Pipeline, Windows GPU CI Pipeline, Linux OpenVINO CI Pipeline |
Azure Pipelines successfully started running 7 pipeline(s). |
Azure Pipelines successfully started running 9 pipeline(s). |
From afar I'm still wondering if we couldn't just call |
I think there's still a gap. Modifications to the graph due to optimizations can potentially invalidate nodes as they can change the values that are available. e.g. if you fuse 2 nodes, the value between the two nodes becomes internal to the fused node (i.e. it is no longer produced in the graph), and any other node that was consuming that value would be broken (consumers can include nodes in subgraphs). obviously this is an invalid fusion and the error is there, but if we don't run checker after making these sorts of changes we don't discover the issue until graph execution time, making it far harder to debug. however that's an existing issue so I assume optimizer issues are obvious enough to be caught prior to checkin |
Potentially you could on the initial load, but following that we only call check_node for new nodes (e.g. created by optimizers). Graph::Resolve, which is where VerifyNodeAndOpMatch is called from, runs many (too many but that's a different issue) times so it needs to be as efficient as possible. There may be other historical reasons as well. |
/azp run Windows x64 QNN CI Pipeline |
Azure Pipelines successfully started running 1 pipeline(s). |
Description
This pull request aims to enhance the efficiency of the inference session creation by eliminating unnecessary
Node::ToProto
invocations. The current codebase presents opportunities for optimization, particularly in the removal of superfluousNode::ToProto
calls, along with their subsequent~NodeProto
invocations.Motivation and Context
The optimization focus of this pull request is on addressing low-hanging fruit in the inference session creation process. By strategically removing undesired
Node::ToProto
calls, we aim to streamline the codebase and enhance the overall performance. The flame graphs illustrate the notable improvements achieved by reducing the percentage ofNode::ToProto
calls, thereby optimizing the execution flow.Code Snippet
big.onnx
model creationTesting in
Release
withperf
yields:Before: 3.3% spent in
Node::ToProto
After: 1.6% spent in
Node::ToProto