Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Move some NodeProto checks to Graph::Graph #19469

Closed
wants to merge 4 commits into from

Conversation

neNasko1
Copy link
Contributor

@neNasko1 neNasko1 commented Feb 8, 2024

Description

Perform some of the Node verification in the constructor of Graph. This is done in a context-free manner.

Motivation and Context

As seen from:
#18791
#19136

The ToProto function called in VerifyNodeAndOpMatch is taking most of startup time. checker::check_node takes no time, so performing naive(context-free) checks while the Node is still in a NodeProto form will reduce the. number of ToProto invocations.

@neNasko1 neNasko1 force-pushed the move-node-verification branch from 9cd331c to 80fda71 Compare February 8, 2024 15:11
@neNasko1
Copy link
Contributor Author

neNasko1 commented Feb 8, 2024

@skottmckay, can you please comment on this PR as you seem close to the issue at hand? I want to be able to conclude the problem with ToProto taking too much time. This looks like a quick and dirty fix in my use case.

@cbourjau
Copy link
Contributor

Is something missing from this PR or has it been superseded, @neNasko1 @skottmckay ?

@justinchuby
Copy link
Contributor

/azp run Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline,ONNX Runtime Web CI Pipeline,Windows ARM64 QNN CI Pipeline

@justinchuby
Copy link
Contributor

/azp run Windows CPU CI Pipeline,Windows GPU CI Pipeline,Windows GPU TensorRT CI Pipeline,Windows x64 QNN CI Pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed,onnxruntime-binary-size-checks-ci-pipeline

Copy link

Azure Pipelines successfully started running 9 pipeline(s).

Copy link

Azure Pipelines successfully started running 8 pipeline(s).

@justinchuby justinchuby requested a review from skottmckay March 21, 2024 14:23
@justinchuby justinchuby added the core runtime issues related to core runtime label Mar 21, 2024
@skottmckay
Copy link
Contributor

Moving the check won't work with subgraphs as you need the outer scope values to be available.

One approach would be to temporarily save a pointer to the original NodeProto in the Node instance when loading from an existing Model so the first Graph::Resolve call uses that. We know the Node instances have not been modified when that occurs.

https://github.com/microsoft/onnxruntime/compare/skottmckay/TestAvoidingNodeToProtoOnFirstGraphResolve

I tested this out with a couple of large models (tinyllama and phi-1.5) which have roughly 4000 nodes each but there was negligible difference in the overall load from doing so as the Node::ToProto call wasn't expensive.

checker::check_node(node_proto, ctx, lsc);
SetOpSchemaFromRegistryForNode(node);
}
ORT_CATCH(...) {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would think we need to log any exceptions. This would be helpful even during testing it let alone at runtime.

@neNasko1
Copy link
Contributor Author

neNasko1 commented Mar 26, 2024

Thank you for the reply, @skottmckay!

I see that the change you presented seems to cover more cases and is a better one overall. It is also worth to note that it is in the spirit of other parts of the codebase.

I tested this out with a couple of large models (tinyllama and phi-1.5) which have roughly 4000 nodes each but there was negligible difference in the overall load from doing so as the Node::ToProto call wasn't expensive.

Benchmarking big-model and tinyllama yields:

Before:

$ for run in {1..10}; do time ./test big.onnx; done
./test big.onnx  4.35s user 0.64s system 99% cpu 5.008 total
./test big.onnx  4.57s user 0.66s system 99% cpu 5.250 total
./test big.onnx  4.49s user 0.63s system 99% cpu 5.150 total
./test big.onnx  4.32s user 0.61s system 99% cpu 4.967 total
./test big.onnx  4.54s user 0.64s system 99% cpu 5.187 total
./test big.onnx  4.30s user 0.64s system 99% cpu 4.962 total
./test big.onnx  4.42s user 0.64s system 99% cpu 5.082 total
./test big.onnx  4.54s user 0.68s system 99% cpu 5.234 total
./test big.onnx  4.38s user 0.68s system 99% cpu 5.067 total
./test big.onnx  4.34s user 0.64s system 99% cpu 4.991 total

$ for run in {1..10}; do time ./test ~/Downloads/model.onnx; done
./test ~/Downloads/model.onnx  0.57s user 0.92s system 17% cpu 8.405 total
./test ~/Downloads/model.onnx  0.50s user 0.65s system 93% cpu 1.233 total
./test ~/Downloads/model.onnx  0.50s user 0.49s system 94% cpu 1.045 total
./test ~/Downloads/model.onnx  0.49s user 0.57s system 93% cpu 1.138 total
./test ~/Downloads/model.onnx  0.49s user 0.52s system 93% cpu 1.077 total
./test ~/Downloads/model.onnx  0.50s user 0.51s system 94% cpu 1.074 total
./test ~/Downloads/model.onnx  0.50s user 0.55s system 92% cpu 1.129 total
./test ~/Downloads/model.onnx  0.49s user 0.51s system 92% cpu 1.079 total
./test ~/Downloads/model.onnx  0.49s user 0.56s system 90% cpu 1.156 total
./test ~/Downloads/model.onnx  0.49s user 0.50s system 96% cpu 1.027 total

After:

$ for run in {1..10}; do time ./test big.onnx; done              
./test big.onnx  3.60s user 0.59s system 93% cpu 4.497 total
./test big.onnx  3.49s user 0.56s system 99% cpu 4.059 total
./test big.onnx  3.36s user 0.56s system 99% cpu 3.931 total
./test big.onnx  3.49s user 0.57s system 99% cpu 4.072 total
./test big.onnx  3.36s user 0.57s system 99% cpu 3.947 total
./test big.onnx  3.51s user 0.63s system 99% cpu 4.147 total
./test big.onnx  3.44s user 0.60s system 99% cpu 4.060 total
./test big.onnx  3.45s user 0.58s system 99% cpu 4.062 total
./test big.onnx  3.53s user 0.58s system 99% cpu 4.128 total
./test big.onnx  3.49s user 0.58s system 99% cpu 4.089 total

$ for run in {1..10}; do time ./test ~/Downloads/model.onnx; done
./test ~/Downloads/model.onnx  0.54s user 0.56s system 93% cpu 1.183 total
./test ~/Downloads/model.onnx  0.49s user 0.60s system 92% cpu 1.180 total
./test ~/Downloads/model.onnx  0.49s user 0.51s system 92% cpu 1.084 total
./test ~/Downloads/model.onnx  0.49s user 0.50s system 92% cpu 1.079 total
./test ~/Downloads/model.onnx  0.49s user 0.52s system 89% cpu 1.126 total
./test ~/Downloads/model.onnx  0.50s user 0.55s system 93% cpu 1.125 total
./test ~/Downloads/model.onnx  0.49s user 0.48s system 94% cpu 1.024 total
./test ~/Downloads/model.onnx  0.49s user 0.47s system 91% cpu 1.057 total
./test ~/Downloads/model.onnx  0.49s user 0.51s system 89% cpu 1.127 total
./test ~/Downloads/model.onnx  0.49s user 0.56s system 90% cpu 1.161 total

You can see that there are no regressions after the change for most models, and models sporting nodes with big attributes that are parsed through protobuf get significantly faster.

So I propose we merge your change!

@neNasko1
Copy link
Contributor Author

neNasko1 commented Apr 2, 2024

I see that the change you presented seems to cover more cases and is a better one overall. It is also worth to note that it is in the spirit of other parts of the codebase.

Is there any development on the matter?

@skottmckay
Copy link
Contributor

I see that the change you presented seems to cover more cases and is a better one overall. It is also worth to note that it is in the spirit of other parts of the codebase.

Is there any development on the matter?

Sorry - was out the last couple of weeks. Created PR with change.

skottmckay added a commit that referenced this pull request Apr 17, 2024
…n creation performance. (#20296)

### Description
<!-- Describe your changes. -->
The first call to Graph::Resolve occurs when creating the Graph instance
when loading an existing model from ModelProto. As the Node instance
will exactly match the source NodeProto there's no need to call
Node::ToProto in this case.

Add a temporary reference to the original NodeProto to avoid the call on
the first Graph::Resolve.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Better alternative to #19469
@neNasko1 neNasko1 closed this Apr 27, 2024
TedThemistokleous pushed a commit to TedThemistokleous/onnxruntime that referenced this pull request May 7, 2024
…n creation performance. (microsoft#20296)

### Description
<!-- Describe your changes. -->
The first call to Graph::Resolve occurs when creating the Graph instance
when loading an existing model from ModelProto. As the Node instance
will exactly match the source NodeProto there's no need to call
Node::ToProto in this case.

Add a temporary reference to the original NodeProto to avoid the call on
the first Graph::Resolve.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Better alternative to microsoft#19469
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core runtime issues related to core runtime
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants