Skip to content

Commit

Permalink
Update transpose optimization to be more QDQ aware (microsoft#18444)
Browse files Browse the repository at this point in the history
### Description
<!-- Describe your changes. -->
Rework some aspects of the transpose optimizer to ensure we have valid
QDQ node units when it is done.

Conceptually we need to let individual Transpose nodes move through the
graph when optimizing. That can invalidate existing QDQ node units or
require new ones. We can fix this after inserting new nodes, or when
transpose optimization finishes moving Transpose nodes.

Fix when inserting new node
- TransposeInputs can add an Unsqueeze (to broadcast) and Transpose to a
node's inputs
- if there was a DQ node providing the input, add a Q -> DQ after
inserting the Unsqueeze/Transpose to make a QDQ node unit for the new
node.
- Unsqueeze/Transpose don't change data, so we can copy the
type/scale/zero point from the existing DQ

Fixes when transpose optimization completes moving Transpose nodes
- Remove empty DQ -> Q pairs if the type/scale/zero point match
- Pushing a Transpose through may have resulted in an existing
Transpose/Reshape being cancelled and removed leaving an empty QDQ node
unit
  - the Transpose being moved may have started in a QDQ node unit
- Transpose that got blocked inside existing QDQ node unit
- e.g. if we hit a DQ -> MatMul -> Q node unit the Transpose gets
blocked after the DQ
- insert a Q -> DQ after the Transpose to put it in a QDQ node unit and
repair the original QDQ node unit
- Transpose moves past a DQ providing a graph output
  - insert a Q -> DQ so the Transpose is in a QDQ node unit

This replaces the existing phase 2 logic which flipped a DQ -> Transpose
to fix a broken QDQ node unit. The new approach should handle more
scenarios and hopefully produce a better graph.

Additionally the logic to handle updates to shared initializers that
feed DQ nodes was simplified (i.e. largely removed). When we update the
shared initializer a Squeeze (if broadcast) and Transpose is added
between the initializer and the DQ for other usages of it. We only need
to check for this pattern in EstimateTransposeValueCost by looking past
a DQ node. We do not need to track the individual DQ nodes leading to an
updated shared initializer.

### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Initially to fix QNN issue with non-const input being transpose and the
QDQ node units being broken.
  • Loading branch information
skottmckay authored and kleiti committed Mar 22, 2024
1 parent 39ad9c8 commit 23b69f3
Show file tree
Hide file tree
Showing 17 changed files with 959 additions and 494 deletions.
284 changes: 178 additions & 106 deletions onnxruntime/core/framework/kernel_type_str_resolver_utils.cc

Large diffs are not rendered by default.

Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,10 @@ inline constexpr std::array kLayoutTransformationPotentiallyAddedOps = {
// @@region_begin(extended_minimal_build_required_kernels)@@

// kOnnxDomain ops
OpIdentifierWithStringViews{kOnnxDomain, "DequantizeLinear", 10},
OpIdentifierWithStringViews{kOnnxDomain, "DequantizeLinear", 13},
OpIdentifierWithStringViews{kOnnxDomain, "DequantizeLinear", 19},
// OpIdentifierWithStringViews{kOnnxDomain, "DequantizeLinear", 21}, pending CPU EP adding support
OpIdentifierWithStringViews{kOnnxDomain, "Gather", 1},
OpIdentifierWithStringViews{kOnnxDomain, "Gather", 11},
OpIdentifierWithStringViews{kOnnxDomain, "Gather", 13},
Expand All @@ -28,6 +32,10 @@ inline constexpr std::array kLayoutTransformationPotentiallyAddedOps = {
OpIdentifierWithStringViews{kOnnxDomain, "Identity", 14},
OpIdentifierWithStringViews{kOnnxDomain, "Identity", 16},
OpIdentifierWithStringViews{kOnnxDomain, "Identity", 19},
OpIdentifierWithStringViews{kOnnxDomain, "QuantizeLinear", 10},
OpIdentifierWithStringViews{kOnnxDomain, "QuantizeLinear", 13},
OpIdentifierWithStringViews{kOnnxDomain, "QuantizeLinear", 19},
// OpIdentifierWithStringViews{kOnnxDomain, "QuantizeLinear", 21}, pending CPU EP adding support
OpIdentifierWithStringViews{kOnnxDomain, "Squeeze", 1},
OpIdentifierWithStringViews{kOnnxDomain, "Squeeze", 11},
OpIdentifierWithStringViews{kOnnxDomain, "Squeeze", 13},
Expand All @@ -39,8 +47,10 @@ inline constexpr std::array kLayoutTransformationPotentiallyAddedOps = {

#if !defined(DISABLE_CONTRIB_OPS)
// kMSDomain ops
OpIdentifierWithStringViews{kMSDomain, "DequantizeLinear", 1},
OpIdentifierWithStringViews{kMSDomain, "NhwcMaxPool", 1},
OpIdentifierWithStringViews{kMSDomain, "QLinearConv", 1},
OpIdentifierWithStringViews{kMSDomain, "QuantizeLinear", 1},
#endif // !defined(DISABLE_CONTRIB_OPS)

// @@region_end(extended_minimal_build_required_kernels)@@
Expand Down
Loading

0 comments on commit 23b69f3

Please sign in to comment.