Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix shape inference bug #19848

Merged
merged 3 commits into from
Mar 29, 2024
Merged

fix shape inference bug #19848

merged 3 commits into from
Mar 29, 2024

Conversation

inisis
Copy link
Contributor

@inisis inisis commented Mar 11, 2024

Description

for nodes like add, their input should be merged dynamically

Motivation and Context

when doing shape inference, for nodes like Add, currently when doing _onnx_infer_single_node, their inputs are generated from last node's output, but they should be merged.

@inisis
Copy link
Contributor Author

inisis commented Mar 11, 2024

@microsoft-github-policy-service agree
@microsoft-github-policy-service agree company="tsingmicro"

@inisis
Copy link
Contributor Author

inisis commented Mar 12, 2024

@tianleiwu Can you please review it

@tianleiwu
Copy link
Contributor

/azp run Windows ARM64 QNN CI Pipeline,Windows x64 QNN CI Pipeline,Windows CPU CI Pipeline,Windows GPU CI Pipeline,Windows GPU TensorRT CI Pipeline,ONNX Runtime Web CI Pipeline,Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline

@tianleiwu
Copy link
Contributor

/azp run Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline,orttraining-amd-gpu-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed,onnxruntime-binary-size-checks-ci-pipeline,Big Models,Android CI Pipeline

@tianleiwu
Copy link
Contributor

/azp run iOS CI Pipeline,ONNX Runtime React Native CI Pipeline

Copy link

Azure Pipelines successfully started running 2 pipeline(s).

Copy link

Azure Pipelines successfully started running 10 pipeline(s).

1 similar comment
Copy link

Azure Pipelines successfully started running 10 pipeline(s).

@tianleiwu
Copy link
Contributor

@inisis
Copy link
Contributor Author

inisis commented Mar 12, 2024

Please add some unit tests in https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/test/python/onnxruntime_test_python_symbolic_shape_infer.py

yes, there is a failed case, where can I upload the onnx model, it's about 400MB, and it's a commonly known model

@tianleiwu
Copy link
Contributor

tianleiwu commented Mar 12, 2024

Please add some unit tests in https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/test/python/onnxruntime_test_python_symbolic_shape_infer.py

yes, there is a failed case, where can I upload the onnx model, it's about 400MB, and it's a commonly known model

We usually add some tiny model in testing. Like a model with only one MatMul node that could trigger the code path.

400MB is too large. You can create an issue and add a reproduce script and link to the model in that issue.

@inisis
Copy link
Contributor Author

inisis commented Mar 12, 2024

Please add some unit tests in https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/test/python/onnxruntime_test_python_symbolic_shape_infer.py

yes, there is a failed case, where can I upload the onnx model, it's about 400MB, and it's a commonly known model

We usually add some tiny model in testing. Like a model with only one MatMul node that could trigger the code path.

400MB is too large. You can create an issue and add a reproduce script and link to the model in that issue.

#19870 Hi, issue is raised

@tianleiwu
Copy link
Contributor

tianleiwu commented Mar 12, 2024

#19870 Hi, issue is raised

See my comments in the issue. I can run shape inference with --auto_merge flag successfully without this change. Could you double check that?

@inisis
Copy link
Contributor Author

inisis commented Mar 13, 2024

#19870 Hi, issue is raised

See my comments in the issue. I can run shape inference with --auto_merge flag successfully without this change. Could you double check that?

Hi, the uploaded model is incorrect, could you please redownload it, I have checked it myself.

@inisis
Copy link
Contributor Author

inisis commented Mar 14, 2024

@tianleiwu Can you please review it.

@tianleiwu
Copy link
Contributor

/azp run Windows GPU CI Pipeline,ONNX Runtime Web CI Pipeline,Linux GPU CI Pipeline,orttraining-amd-gpu-ci-pipeline,Android CI Pipeline,iOS CI Pipeline,ONNX Runtime React Native CI Pipeline

Copy link

Azure Pipelines successfully started running 7 pipeline(s).

@inisis
Copy link
Contributor Author

inisis commented Mar 20, 2024

Hi @tianleiwu How can I fix the failing cases, the error is not clear to me.

@tianleiwu
Copy link
Contributor

Let me re-run the failed pipeline

@inisis
Copy link
Contributor Author

inisis commented Mar 23, 2024

@tianleiwu Hi, can this be merged, I have many more commits to push.

@inisis
Copy link
Contributor Author

inisis commented Mar 23, 2024

@tianleiwu since shape inference is super important when doing model optimization, can I keep this code in my repo, I would prefer maintain my own version of onnxruntime shape inference, and I will commit my code gradually, there are lots of bugs.

@tianleiwu
Copy link
Contributor

There is one required pipeline "Windows GPU CI Pipeline" failed so this cannot merge. Let me try re-run last time. If failed again, please try rebase to latest main.

@inisis
Copy link
Contributor Author

inisis commented Mar 24, 2024

There is one required pipeline "Windows GPU CI Pipeline" failed so this cannot merge. Let me try re-run last time. If failed again, please try rebase to latest main.

it falied again, and it seems to be like a problem with cmake

@tianleiwu
Copy link
Contributor

/azp run Windows ARM64 QNN CI Pipeline,Windows CPU CI Pipeline,Windows GPU TensorRT CI Pipeline,ONNX Runtime Web CI Pipeline,Linux CPU CI Pipeline,Linux GPU CI Pipeline,Linux OpenVINO CI Pipeline,orttraining-amd-gpu-ci-pipeline,orttraining-ortmodule-distributed,Big Models

Copy link

Azure Pipelines successfully started running 10 pipeline(s).

@inisis
Copy link
Contributor Author

inisis commented Mar 28, 2024

@tianleiwu the failed case show that Failed to download Python from the Github Actions python registry (https://github.com/actions/python-versions), maybe the network problem.

@tianleiwu
Copy link
Contributor

/azp run Windows x64 QNN CI Pipeline,Windows GPU CI Pipeline,ONNX Runtime Web CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU TensorRT CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline,orttraining-amd-gpu-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline

@tianleiwu
Copy link
Contributor

/azp run onnxruntime-binary-size-checks-ci-pipeline,Android CI Pipeline,iOS CI Pipeline,ONNX Runtime React Native CI Pipeline

Copy link

Azure Pipelines successfully started running 4 pipeline(s).

Copy link

Azure Pipelines successfully started running 10 pipeline(s).

@tianleiwu tianleiwu merged commit 8396845 into microsoft:main Mar 29, 2024
77 of 81 checks passed
TedThemistokleous pushed a commit to TedThemistokleous/onnxruntime that referenced this pull request May 7, 2024
### Description
for nodes like add, their input should be merged dynamically

### Motivation and Context
when doing shape inference, for nodes like Add, currently when doing _onnx_infer_single_node, their inputs are generated from last node's output, but they should be merged.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants