Remove useless `NodeProto` serializations #18791

neNasko1 · 2023-12-12T13:49:01Z

Description

This pull request aims to enhance the efficiency of the inference session creation by eliminating unnecessary Node::ToProto invocations. The current codebase presents opportunities for optimization, particularly in the removal of superfluous Node::ToProto calls, along with their subsequent ~NodeProto invocations.

Motivation and Context

The optimization focus of this pull request is on addressing low-hanging fruit in the inference session creation process. By strategically removing undesired Node::ToProto calls, we aim to streamline the codebase and enhance the overall performance. The flame graphs illustrate the notable improvements achieved by reducing the percentage of Node::ToProto calls, thereby optimizing the execution flow.

Code Snippet

TEST(InferenceSessionTests, Bench) {
  // Initialize logging manager
  auto logging_manager = std::make_unique<logging::LoggingManager>(
      std::unique_ptr<ISink>(new CLogSink()), logging::Severity::kVERBOSE, false,
      LoggingManager::InstanceType::Temporal);

  // Create environment
  std::unique_ptr<Environment> env;
  auto st = Environment::Create(std::move(logging_manager), env);
  ASSERT_TRUE(st.IsOK());

  // Configure session options
  SessionOptions so;
  so.execution_mode = ExecutionMode::ORT_SEQUENTIAL;
  so.graph_optimization_level = TransformerLevel::Level2;
  so.intra_op_param.thread_pool_size = 1;

  // Initialize and load the InferenceSession
  InferenceSessionTestGlobalThreadPools session1{so, *env};
  ASSERT_STATUS_OK(session1.Load("big.onnx"));
  ASSERT_STATUS_OK(session1.Initialize());
}

`big.onnx` model creation

import onnx
import numpy as np
from spox import argument, build, Tensor, Var
from spox.opset.ai.onnx import v17 as op
from spox.opset.ai.onnx.ml.v3 import label_encoder

a = argument(Tensor(np.int64, ('N',)))
c = a

for x in range(1000):
    c = op.mul(c, op.const(np.ones(10000, dtype=np.int64)))

for x in range(3000):
    all_strings = list("random_string" + str(i) for i in range(100))
    all_ints = list(range(len(all_strings)))
    c = label_encoder(
        c,
        keys_int64s=all_ints,
        values_strings=all_strings
    )
    c = label_encoder(c, keys_strings=all_strings, values_int64s=all_ints)

model: onnx.ModelProto = build(inputs={'a': a}, outputs={'c': c})
onnx.save(model, "big.onnx")

Testing in Release with perf yields:
Before: 3.3% spent in Node::ToProto
After: 1.6% spent in Node::ToProto

cbourjau · 2023-12-13T13:11:08Z

Thanks for looking into this! Can you or somebody else tell me if there is any functional difference between calling check_node here and calling check_graph earlier? The latter would have the advantage that every node is still in its NodeProto format.

neNasko1 · 2023-12-14T13:41:31Z

The improvements seem more drastic when working on Node-s with large attributes:

import onnx
import numpy as np
from spox import argument, build, Tensor, Var
from spox.opset.ai.onnx import v17 as op
from spox.opset.ai.onnx.ml.v3 import label_encoder

a = argument(Tensor(np.int64, ('N',)))
c = a

for x in range(100):
    all_strings = list("random_string" + str(i) for i in range(100000))
    all_ints = list(range(len(all_strings)))
    c = op.if_(
        op.const(True),
        then_branch=lambda: [
            label_encoder(
                c,
                keys_int64s=all_ints,
                values_strings=all_strings
            )
        ],
        else_branch=lambda: [
            label_encoder(
                c,
                keys_int64s=all_ints,
                values_strings=all_strings
            )
        ]
    )[0]
    c = label_encoder(c, keys_strings=all_strings, values_int64s=all_ints)

model: onnx.ModelProto = build(inputs={'a': a}, outputs={'c': c})
onnx.save(model, "big.onnx")

On my machine the provided InferenceSessionTests.Bench goes down from 4800ms to 4000ms for the Release build.

neNasko1 · 2023-12-14T14:30:42Z

Thanks for looking into this! Can you or somebody else tell me if there is any functional difference between calling check_node here and calling check_graph earlier? The latter would have the advantage that every node is still in its NodeProto format.

Also can someone elaborate if check_graph is even needed in this context?

cbourjau · 2023-12-19T08:54:26Z

Sorry, for randomly tagging you, @snnn, but I saw your name in the git blame. Would you know the right person to review this?

neNasko1 · 2024-01-02T19:50:31Z

Am I right to assume that in this case check_graph can be skipped? If this is not the case, maybe some additional tests have to be added, since currently all are passing.

justinchuby · 2024-01-02T23:53:16Z

/azp run Windows GPU TensorRT CI Pipeline, onnxruntime-binary-size-checks-ci-pipeline, onnxruntime-python-checks-ci-pipeline, orttraining-linux-ci-pipeline, orttraining-ortmodule-distributed, orttraining-linux-gpu-ci-pipeline, Linux QNN CI Pipeline, Windows ARM64 QNN CI Pipeline

justinchuby · 2024-01-02T23:53:30Z

/azp run Linux CPU CI Pipeline, Linux CPU Minimal Build E2E CI Pipeline, Linux GPU CI Pipeline, Linux GPU TensorRT CI Pipeline, Linux Nuphar CI Pipeline, MacOS CI Pipeline, ONNX Runtime Web CI Pipeline, Windows CPU CI Pipeline, Windows GPU CI Pipeline, Linux OpenVINO CI Pipeline

azure-pipelines · 2024-01-02T23:53:48Z

Azure Pipelines successfully started running 7 pipeline(s).

azure-pipelines · 2024-01-02T23:54:09Z

Azure Pipelines successfully started running 9 pipeline(s).

justinchuby · 2024-01-02T23:59:36Z

/azp run Windows x64 QNN CI Pipeline

azure-pipelines · 2024-01-02T23:59:46Z

Azure Pipelines successfully started running 1 pipeline(s).

onnxruntime/core/graph/graph.cc

neNasko1 · 2024-01-03T12:52:33Z

TLDR: This change now consists of creating NodeProto objects only when needed. This is a significant improvement in the case of large Attribute-s.

justinchuby · 2024-01-03T16:52:16Z

/azp run Windows GPU TensorRT CI Pipeline, onnxruntime-binary-size-checks-ci-pipeline, onnxruntime-python-checks-ci-pipeline, orttraining-linux-ci-pipeline, orttraining-ortmodule-distributed, orttraining-linux-gpu-ci-pipeline, Linux QNN CI Pipeline, Windows ARM64 QNN CI Pipeline

justinchuby · 2024-01-03T16:52:30Z

/azp run Linux CPU CI Pipeline, Linux CPU Minimal Build E2E CI Pipeline, Linux GPU CI Pipeline, Linux GPU TensorRT CI Pipeline, Linux Nuphar CI Pipeline, MacOS CI Pipeline, ONNX Runtime Web CI Pipeline, Windows CPU CI Pipeline, Windows GPU CI Pipeline, Linux OpenVINO CI Pipeline

azure-pipelines · 2024-01-03T16:52:49Z

Azure Pipelines successfully started running 7 pipeline(s).

azure-pipelines · 2024-01-03T16:53:08Z

Azure Pipelines successfully started running 9 pipeline(s).

cbourjau · 2024-01-03T18:17:04Z

Can you or somebody else tell me if there is any functional difference between calling check_node here and calling check_graph earlier? The latter would have the advantage that every node is still in its NodeProto format.

From afar I'm still wondering if we couldn't just call check_graph in the very beginning. Am I missing something obvious?

skottmckay · 2024-01-03T22:22:32Z

I think there's still a gap. Modifications to the graph due to optimizations can potentially invalidate nodes as they can change the values that are available. e.g. if you fuse 2 nodes, the value between the two nodes becomes internal to the fused node (i.e. it is no longer produced in the graph), and any other node that was consuming that value would be broken (consumers can include nodes in subgraphs).

obviously this is an invalid fusion and the error is there, but if we don't run checker after making these sorts of changes we don't discover the issue until graph execution time, making it far harder to debug.

however that's an existing issue so I assume optimizer issues are obvious enough to be caught prior to checkin

skottmckay · 2024-01-04T05:12:33Z

From afar I'm still wondering if we couldn't just call check_graph in the very beginning. Am I missing something obvious?

Potentially you could on the initial load, but following that we only call check_node for new nodes (e.g. created by optimizers). Graph::Resolve, which is where VerifyNodeAndOpMatch is called from, runs many (too many but that's a different issue) times so it needs to be as efficient as possible.

There may be other historical reasons as well.

skottmckay · 2024-01-04T05:13:49Z

/azp run Windows x64 QNN CI Pipeline

azure-pipelines · 2024-01-04T05:13:59Z

Azure Pipelines successfully started running 1 pipeline(s).

Atanas Dimitrov added 2 commits December 12, 2023 15:10

Remove extra NodeProto serialization in Graph::VerifyNodeAndOpMatch

d87f339

Appease linter

783bc00

Call SetOpSchemaFromRegistryForNode while initializing model

eb2d26f

Merge branch 'microsoft:main' into remove-useless-nodeproto-ser

22603fc

neNasko1 changed the title ~~Remove useless nodeproto ser~~ Remove useless NodeProto serializations Dec 14, 2023

snnn added the core runtime issues related to core runtime label Dec 20, 2023

justinchuby requested review from snnn and skottmckay January 3, 2024 00:06

skottmckay reviewed Jan 3, 2024

View reviewed changes

onnxruntime/core/graph/graph.cc Outdated Show resolved Hide resolved

Revert breaking change

fec9829

skottmckay approved these changes Jan 4, 2024

View reviewed changes

skottmckay merged commit 4e2d88b into microsoft:main Jan 4, 2024
62 of 70 checks passed

neNasko1 mentioned this pull request Jan 5, 2024

[Performance] Inference session creation takes too long #19022

Closed

neNasko1 mentioned this pull request Jan 14, 2024

Make it possible to pass the assumption that a model is correct #19136

Closed

neNasko1 mentioned this pull request Feb 8, 2024

Move some NodeProto checks to Graph::Graph #19469

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove useless `NodeProto` serializations #18791

Remove useless `NodeProto` serializations #18791

neNasko1 commented Dec 12, 2023

cbourjau commented Dec 13, 2023

neNasko1 commented Dec 14, 2023

neNasko1 commented Dec 14, 2023

cbourjau commented Dec 19, 2023

neNasko1 commented Jan 2, 2024

justinchuby commented Jan 2, 2024

justinchuby commented Jan 2, 2024

azure-pipelines bot commented Jan 2, 2024

azure-pipelines bot commented Jan 2, 2024

justinchuby commented Jan 2, 2024

azure-pipelines bot commented Jan 2, 2024

neNasko1 commented Jan 3, 2024

justinchuby commented Jan 3, 2024

justinchuby commented Jan 3, 2024

azure-pipelines bot commented Jan 3, 2024

azure-pipelines bot commented Jan 3, 2024

cbourjau commented Jan 3, 2024

skottmckay commented Jan 3, 2024 •

edited

Loading

skottmckay commented Jan 4, 2024

skottmckay commented Jan 4, 2024

azure-pipelines bot commented Jan 4, 2024

Remove useless NodeProto serializations #18791

Remove useless NodeProto serializations #18791

Conversation

neNasko1 commented Dec 12, 2023

Description

Motivation and Context

Code Snippet

big.onnx model creation

cbourjau commented Dec 13, 2023

neNasko1 commented Dec 14, 2023

neNasko1 commented Dec 14, 2023

cbourjau commented Dec 19, 2023

neNasko1 commented Jan 2, 2024

justinchuby commented Jan 2, 2024

justinchuby commented Jan 2, 2024

azure-pipelines bot commented Jan 2, 2024

azure-pipelines bot commented Jan 2, 2024

justinchuby commented Jan 2, 2024

azure-pipelines bot commented Jan 2, 2024

neNasko1 commented Jan 3, 2024

justinchuby commented Jan 3, 2024

justinchuby commented Jan 3, 2024

azure-pipelines bot commented Jan 3, 2024

azure-pipelines bot commented Jan 3, 2024

cbourjau commented Jan 3, 2024

skottmckay commented Jan 3, 2024 • edited Loading

skottmckay commented Jan 4, 2024

skottmckay commented Jan 4, 2024

azure-pipelines bot commented Jan 4, 2024

Remove useless `NodeProto` serializations #18791

Remove useless `NodeProto` serializations #18791

`big.onnx` model creation

skottmckay commented Jan 3, 2024 •

edited

Loading