Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Update transpose optimization to be more QDQ aware (microsoft#18444)
### Description <!-- Describe your changes. --> Rework some aspects of the transpose optimizer to ensure we have valid QDQ node units when it is done. Conceptually we need to let individual Transpose nodes move through the graph when optimizing. That can invalidate existing QDQ node units or require new ones. We can fix this after inserting new nodes, or when transpose optimization finishes moving Transpose nodes. Fix when inserting new node - TransposeInputs can add an Unsqueeze (to broadcast) and Transpose to a node's inputs - if there was a DQ node providing the input, add a Q -> DQ after inserting the Unsqueeze/Transpose to make a QDQ node unit for the new node. - Unsqueeze/Transpose don't change data, so we can copy the type/scale/zero point from the existing DQ Fixes when transpose optimization completes moving Transpose nodes - Remove empty DQ -> Q pairs if the type/scale/zero point match - Pushing a Transpose through may have resulted in an existing Transpose/Reshape being cancelled and removed leaving an empty QDQ node unit - the Transpose being moved may have started in a QDQ node unit - Transpose that got blocked inside existing QDQ node unit - e.g. if we hit a DQ -> MatMul -> Q node unit the Transpose gets blocked after the DQ - insert a Q -> DQ after the Transpose to put it in a QDQ node unit and repair the original QDQ node unit - Transpose moves past a DQ providing a graph output - insert a Q -> DQ so the Transpose is in a QDQ node unit This replaces the existing phase 2 logic which flipped a DQ -> Transpose to fix a broken QDQ node unit. The new approach should handle more scenarios and hopefully produce a better graph. Additionally the logic to handle updates to shared initializers that feed DQ nodes was simplified (i.e. largely removed). When we update the shared initializer a Squeeze (if broadcast) and Transpose is added between the initializer and the DQ for other usages of it. We only need to check for this pattern in EstimateTransposeValueCost by looking past a DQ node. We do not need to track the individual DQ nodes leading to an updated shared initializer. ### Motivation and Context <!-- - Why is this change required? What problem does it solve? - If it fixes an open issue, please link to the issue here. --> Initially to fix QNN issue with non-const input being transpose and the QDQ node units being broken.
- Loading branch information