Update transpose optimization to be more QDQ aware (microsoft#18444)

### Description  Rework some aspects of the transpose optimizer to ensure we have valid QDQ node units when it is done. Conceptually we need to let individual Transpose nodes move through the graph when optimizing. That can invalidate existing QDQ node units or require new ones. We can fix this after inserting new nodes, or when transpose optimization finishes moving Transpose nodes. Fix when inserting new node - TransposeInputs can add an Unsqueeze (to broadcast) and Transpose to a node's inputs - if there was a DQ node providing the input, add a Q -> DQ after inserting the Unsqueeze/Transpose to make a QDQ node unit for the new node. - Unsqueeze/Transpose don't change data, so we can copy the type/scale/zero point from the existing DQ Fixes when transpose optimization completes moving Transpose nodes - Remove empty DQ -> Q pairs if the type/scale/zero point match - Pushing a Transpose through may have resulted in an existing Transpose/Reshape being cancelled and removed leaving an empty QDQ node unit - the Transpose being moved may have started in a QDQ node unit - Transpose that got blocked inside existing QDQ node unit - e.g. if we hit a DQ -> MatMul -> Q node unit the Transpose gets blocked after the DQ - insert a Q -> DQ after the Transpose to put it in a QDQ node unit and repair the original QDQ node unit - Transpose moves past a DQ providing a graph output - insert a Q -> DQ so the Transpose is in a QDQ node unit This replaces the existing phase 2 logic which flipped a DQ -> Transpose to fix a broken QDQ node unit. The new approach should handle more scenarios and hopefully produce a better graph. Additionally the logic to handle updates to shared initializers that feed DQ nodes was simplified (i.e. largely removed). When we update the shared initializer a Squeeze (if broadcast) and Transpose is added between the initializer and the DQ for other usages of it. We only need to check for this pattern in EstimateTransposeValueCost by looking past a DQ node. We do not need to track the individual DQ nodes leading to an updated shared initializer. ### Motivation and Context  Initially to fix QNN issue with non-const input being transpose and the QDQ node units being broken.
kleiti · Mar 22, 2024 · 23b69f3 · 23b69f3
1 parent 39ad9c8
commit 23b69f3
Show file tree

Hide file tree

Showing 17 changed files with 959 additions and 494 deletions.
diff --git a/onnxruntime/core/framework/kernel_type_str_resolver_utils.cc b/onnxruntime/core/framework/kernel_type_str_resolver_utils.cc
diff --git a/...untime/core/optimizer/layout_transformation/layout_transformation_potentially_added_ops.h b/...untime/core/optimizer/layout_transformation/layout_transformation_potentially_added_ops.h
@@ -20,6 +20,10 @@ inline constexpr std::array kLayoutTransformationPotentiallyAddedOps = {
     // @@region_begin(extended_minimal_build_required_kernels)@@
 
     // kOnnxDomain ops
+    OpIdentifierWithStringViews{kOnnxDomain, "DequantizeLinear", 10},
+    OpIdentifierWithStringViews{kOnnxDomain, "DequantizeLinear", 13},
+    OpIdentifierWithStringViews{kOnnxDomain, "DequantizeLinear", 19},
+    // OpIdentifierWithStringViews{kOnnxDomain, "DequantizeLinear", 21}, pending CPU EP adding support
     OpIdentifierWithStringViews{kOnnxDomain, "Gather", 1},
     OpIdentifierWithStringViews{kOnnxDomain, "Gather", 11},
     OpIdentifierWithStringViews{kOnnxDomain, "Gather", 13},
@@ -28,6 +32,10 @@ inline constexpr std::array kLayoutTransformationPotentiallyAddedOps = {
     OpIdentifierWithStringViews{kOnnxDomain, "Identity", 14},
     OpIdentifierWithStringViews{kOnnxDomain, "Identity", 16},
     OpIdentifierWithStringViews{kOnnxDomain, "Identity", 19},
+    OpIdentifierWithStringViews{kOnnxDomain, "QuantizeLinear", 10},
+    OpIdentifierWithStringViews{kOnnxDomain, "QuantizeLinear", 13},
+    OpIdentifierWithStringViews{kOnnxDomain, "QuantizeLinear", 19},
+    // OpIdentifierWithStringViews{kOnnxDomain, "QuantizeLinear", 21}, pending CPU EP adding support
     OpIdentifierWithStringViews{kOnnxDomain, "Squeeze", 1},
     OpIdentifierWithStringViews{kOnnxDomain, "Squeeze", 11},
     OpIdentifierWithStringViews{kOnnxDomain, "Squeeze", 13},
@@ -39,8 +47,10 @@ inline constexpr std::array kLayoutTransformationPotentiallyAddedOps = {
 
 #if !defined(DISABLE_CONTRIB_OPS)
     // kMSDomain ops
+    OpIdentifierWithStringViews{kMSDomain, "DequantizeLinear", 1},
     OpIdentifierWithStringViews{kMSDomain, "NhwcMaxPool", 1},
     OpIdentifierWithStringViews{kMSDomain, "QLinearConv", 1},
+    OpIdentifierWithStringViews{kMSDomain, "QuantizeLinear", 1},
 #endif  // !defined(DISABLE_CONTRIB_OPS)
 
     // @@region_end(extended_minimal_build_required_kernels)@@