-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Move some NodeProto
checks to Graph::Graph
#19469
Conversation
9cd331c
to
80fda71
Compare
@skottmckay, can you please comment on this PR as you seem close to the issue at hand? I want to be able to conclude the problem with |
Is something missing from this PR or has it been superseded, @neNasko1 @skottmckay ? |
/azp run Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline,Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline,ONNX Runtime Web CI Pipeline,Windows ARM64 QNN CI Pipeline |
/azp run Windows CPU CI Pipeline,Windows GPU CI Pipeline,Windows GPU TensorRT CI Pipeline,Windows x64 QNN CI Pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed,onnxruntime-binary-size-checks-ci-pipeline |
Azure Pipelines successfully started running 9 pipeline(s). |
Azure Pipelines successfully started running 8 pipeline(s). |
Moving the check won't work with subgraphs as you need the outer scope values to be available. One approach would be to temporarily save a pointer to the original NodeProto in the Node instance when loading from an existing Model so the first Graph::Resolve call uses that. We know the Node instances have not been modified when that occurs. I tested this out with a couple of large models (tinyllama and phi-1.5) which have roughly 4000 nodes each but there was negligible difference in the overall load from doing so as the Node::ToProto call wasn't expensive. |
checker::check_node(node_proto, ctx, lsc); | ||
SetOpSchemaFromRegistryForNode(node); | ||
} | ||
ORT_CATCH(...) {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would think we need to log any exceptions. This would be helpful even during testing it let alone at runtime.
Thank you for the reply, @skottmckay! I see that the change you presented seems to cover more cases and is a better one overall. It is also worth to note that it is in the spirit of other parts of the codebase.
Benchmarking big-model and tinyllama yields: Before: $ for run in {1..10}; do time ./test big.onnx; done
./test big.onnx 4.35s user 0.64s system 99% cpu 5.008 total
./test big.onnx 4.57s user 0.66s system 99% cpu 5.250 total
./test big.onnx 4.49s user 0.63s system 99% cpu 5.150 total
./test big.onnx 4.32s user 0.61s system 99% cpu 4.967 total
./test big.onnx 4.54s user 0.64s system 99% cpu 5.187 total
./test big.onnx 4.30s user 0.64s system 99% cpu 4.962 total
./test big.onnx 4.42s user 0.64s system 99% cpu 5.082 total
./test big.onnx 4.54s user 0.68s system 99% cpu 5.234 total
./test big.onnx 4.38s user 0.68s system 99% cpu 5.067 total
./test big.onnx 4.34s user 0.64s system 99% cpu 4.991 total
$ for run in {1..10}; do time ./test ~/Downloads/model.onnx; done
./test ~/Downloads/model.onnx 0.57s user 0.92s system 17% cpu 8.405 total
./test ~/Downloads/model.onnx 0.50s user 0.65s system 93% cpu 1.233 total
./test ~/Downloads/model.onnx 0.50s user 0.49s system 94% cpu 1.045 total
./test ~/Downloads/model.onnx 0.49s user 0.57s system 93% cpu 1.138 total
./test ~/Downloads/model.onnx 0.49s user 0.52s system 93% cpu 1.077 total
./test ~/Downloads/model.onnx 0.50s user 0.51s system 94% cpu 1.074 total
./test ~/Downloads/model.onnx 0.50s user 0.55s system 92% cpu 1.129 total
./test ~/Downloads/model.onnx 0.49s user 0.51s system 92% cpu 1.079 total
./test ~/Downloads/model.onnx 0.49s user 0.56s system 90% cpu 1.156 total
./test ~/Downloads/model.onnx 0.49s user 0.50s system 96% cpu 1.027 total After: $ for run in {1..10}; do time ./test big.onnx; done
./test big.onnx 3.60s user 0.59s system 93% cpu 4.497 total
./test big.onnx 3.49s user 0.56s system 99% cpu 4.059 total
./test big.onnx 3.36s user 0.56s system 99% cpu 3.931 total
./test big.onnx 3.49s user 0.57s system 99% cpu 4.072 total
./test big.onnx 3.36s user 0.57s system 99% cpu 3.947 total
./test big.onnx 3.51s user 0.63s system 99% cpu 4.147 total
./test big.onnx 3.44s user 0.60s system 99% cpu 4.060 total
./test big.onnx 3.45s user 0.58s system 99% cpu 4.062 total
./test big.onnx 3.53s user 0.58s system 99% cpu 4.128 total
./test big.onnx 3.49s user 0.58s system 99% cpu 4.089 total
$ for run in {1..10}; do time ./test ~/Downloads/model.onnx; done
./test ~/Downloads/model.onnx 0.54s user 0.56s system 93% cpu 1.183 total
./test ~/Downloads/model.onnx 0.49s user 0.60s system 92% cpu 1.180 total
./test ~/Downloads/model.onnx 0.49s user 0.51s system 92% cpu 1.084 total
./test ~/Downloads/model.onnx 0.49s user 0.50s system 92% cpu 1.079 total
./test ~/Downloads/model.onnx 0.49s user 0.52s system 89% cpu 1.126 total
./test ~/Downloads/model.onnx 0.50s user 0.55s system 93% cpu 1.125 total
./test ~/Downloads/model.onnx 0.49s user 0.48s system 94% cpu 1.024 total
./test ~/Downloads/model.onnx 0.49s user 0.47s system 91% cpu 1.057 total
./test ~/Downloads/model.onnx 0.49s user 0.51s system 89% cpu 1.127 total
./test ~/Downloads/model.onnx 0.49s user 0.56s system 90% cpu 1.161 total You can see that there are no regressions after the change for most models, and models sporting nodes with big attributes that are parsed through So I propose we merge your change! |
Is there any development on the matter? |
Better alternative to #19469
Sorry - was out the last couple of weeks. Created PR with change. |
…n creation performance. (#20296) ### Description <!-- Describe your changes. --> The first call to Graph::Resolve occurs when creating the Graph instance when loading an existing model from ModelProto. As the Node instance will exactly match the source NodeProto there's no need to call Node::ToProto in this case. Add a temporary reference to the original NodeProto to avoid the call on the first Graph::Resolve. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Better alternative to #19469
…n creation performance. (microsoft#20296) ### Description <!-- Describe your changes. --> The first call to Graph::Resolve occurs when creating the Graph instance when loading an existing model from ModelProto. As the Node instance will exactly match the source NodeProto there's no need to call Node::ToProto in this case. Add a temporary reference to the original NodeProto to avoid the call on the first Graph::Resolve. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Better alternative to microsoft#19469
Description
Perform some of the
Node
verification in the constructor ofGraph
. This is done in a context-free manner.Motivation and Context
As seen from:
#18791
#19136
The
ToProto
function called inVerifyNodeAndOpMatch
is taking most of startup time.checker::check_node
takes no time, so performing naive(context-free) checks while theNode
is still in aNodeProto
form will reduce the. number ofToProto
invocations.