Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Drop QDQ around more nodes #21376

Merged
merged 45 commits into from
Aug 27, 2024
Merged

Drop QDQ around more nodes #21376

merged 45 commits into from
Aug 27, 2024

Conversation

mcollinswisc
Copy link
Contributor

Description

Extends the Drop QDQ optimization to remove DequantizeLinear and QuantizeLinear nodes from around operators:

  • Flatten
  • Expand
  • Tile
  • Slice
  • GatherElements
  • ReduceMin
  • ReduceMax

Motivation and Context

To reduce floating-point conversions in quantize inference. Mainly motivated by the Flatten case, since that will show up in graphs exported from PyTorch to ONNX. But to make the change complete, extending to a larger set of ops for which this optimization is valid.

#21375

mcollinswisc and others added 27 commits June 25, 2024 17:20
With matching quantization parameters:

  DequantizeLinear ∘ Flatten ∘ QuantizeLinear

is equivalent to just the Flatten, and it saves some floating-
point computations. There's already support for a similar
optimization for an equivalent Reshape: this change extends the
existing optimization to also recognize Flatten.

microsoft#21167
Currently, the DropQDQNodesRules optimization removes QuantizeLinear and
DequantizeLinear nodes from DequantizeLinear∘MaxPool∘QuantizeLinear.
However, if the x_scale/y_scale values are non-positive, this changes
the ordering of the elements in the input value, so this optimization is
changing the results.

This change adds a check for whether the scale in the QuantizeLinear (or
DequantizeLinear) is a positive scalar, and a new selector to disallow
removing the QDQ around MaxPool if it is not.

microsoft#21176
Only does so if the scale is positive
No integer implementations are present, so they need to stay in
floating-point.
microsoft#21287
Don't expect the drop qdq optimization to work for multiple inputs
for now.
Apparently the type constraints for these ops don't include 16-bit
integers.
Results don't appear to match
@mcollinswisc
Copy link
Contributor Author

Changes are on top of #21182, since it also needs to check for positive scale when dropping QDQ around ReduceMin and ReduceMax

@mcollinswisc mcollinswisc marked this pull request as ready for review August 1, 2024 16:57
@mcollinswisc mcollinswisc requested a review from skottmckay August 1, 2024 17:01
@skottmckay
Copy link
Contributor

/azp run Windows ARM64 QNN CI Pipeline,Windows x64 QNN CI Pipeline,Windows CPU CI Pipeline,Windows GPU CI Pipeline,Windows GPU TensorRT CI Pipeline,ONNX Runtime Web CI Pipeline,Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline

@skottmckay
Copy link
Contributor

/azp run Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline,orttraining-amd-gpu-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed,onnxruntime-binary-size-checks-ci-pipeline,Big Models,Linux Android Emulator QNN CI Pipeline

@skottmckay
Copy link
Contributor

/azp run Android CI Pipeline,iOS CI Pipeline,ONNX Runtime React Native CI Pipeline

Copy link

Azure Pipelines successfully started running 3 pipeline(s).

Copy link

Azure Pipelines successfully started running 9 pipeline(s).

Copy link

Azure Pipelines successfully started running 10 pipeline(s).

I guess these are no longer lined up anyway after moving some to
previous line.
@skottmckay
Copy link
Contributor

/azp run Windows ARM64 QNN CI Pipeline,Windows x64 QNN CI Pipeline,Windows CPU CI Pipeline,Windows GPU CI Pipeline,Windows GPU TensorRT CI Pipeline,ONNX Runtime Web CI Pipeline,Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline

@skottmckay
Copy link
Contributor

/azp run Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline,orttraining-amd-gpu-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed,onnxruntime-binary-size-checks-ci-pipeline,Big Models,Linux Android Emulator QNN CI Pipeline

@skottmckay
Copy link
Contributor

/azp run Android CI Pipeline,iOS CI Pipeline,ONNX Runtime React Native CI Pipeline

Copy link

Azure Pipelines successfully started running 3 pipeline(s).

Copy link

Azure Pipelines successfully started running 9 pipeline(s).

Copy link

Azure Pipelines successfully started running 10 pipeline(s).

@mcollinswisc
Copy link
Contributor Author

mcollinswisc commented Aug 15, 2024

I had missed the reason why the windows builds were failing last few commits (currently don't have a Windows system to try locally), sorry

I guess since it's

onnxruntime\test\optimizer\qdq_transformer_test.cc(1,1): Error C1128: number of sections exceeded object file format limit: compile with /bigobj

I'll try adding the /bigobj flag similarly to here:

set_property(SOURCE "${TEST_SRC_DIR}/optimizer/graph_transform_test.cc"

Seeing:

fatal error C1128: number of sections exceeded object file format limit: compile with /bigobj

so apparently these additional tests are pushing this file over
the limit. Given there's already a statement setting /bigobj for
sibling graph_transform_test, simply copy-pasting that for
qdq_transformer_test
@skottmckay
Copy link
Contributor

/azp run Windows ARM64 QNN CI Pipeline,Windows x64 QNN CI Pipeline,Windows CPU CI Pipeline,Windows GPU CI Pipeline,Windows GPU TensorRT CI Pipeline,ONNX Runtime Web CI Pipeline,Linux CPU CI Pipeline,Linux CPU Minimal Build E2E CI Pipeline,Linux GPU CI Pipeline,Linux GPU TensorRT CI Pipeline

@skottmckay
Copy link
Contributor

/azp run Linux OpenVINO CI Pipeline,Linux QNN CI Pipeline,MacOS CI Pipeline,orttraining-amd-gpu-ci-pipeline,orttraining-linux-ci-pipeline,orttraining-linux-gpu-ci-pipeline,orttraining-ortmodule-distributed,onnxruntime-binary-size-checks-ci-pipeline,Big Models,Linux Android Emulator QNN CI Pipeline

@skottmckay
Copy link
Contributor

/azp run Android CI Pipeline,iOS CI Pipeline,ONNX Runtime React Native CI Pipeline

Copy link

Azure Pipelines successfully started running 2 pipeline(s), but failed to run 1 pipeline(s).

Copy link

Azure Pipelines successfully started running 9 pipeline(s).

Copy link

Azure Pipelines successfully started running 10 pipeline(s).

@skottmckay
Copy link
Contributor

/azp run Windows GPU CUDA CI Pipeline,Windows GPU DML CI Pipeline,Windows GPU Doc Gen CI Pipeline

Copy link

Azure Pipelines successfully started running 3 pipeline(s).

@skottmckay skottmckay merged commit 5d54dc1 into microsoft:main Aug 27, 2024
80 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants