[Feature Request] Missing optimization of DequantizeLinear ∘ Flatten ∘ QuantizeLinear? #21375

mcollinswisc · 2024-07-16T19:21:57Z

Discussed in #21167

^{Originally posted by mcollinswisc June 25, 2024}
It looks like ONNXRuntime will optimize DequantizeLinear ∘ Reshape ∘ QuantizeLinear to only the Reshape, eliminating the quantization/de-quantization, if the scales & zero points are the same.

However, an equivalent Flatten is not optimized. Is this likely to be just a missing optimization, or is there some reason the qdq would be preserved in this case?

Tested out in:
https://gist.github.com/mcollinswisc/d1cd9d13b4e5fbad01c75dca5c9ca576
with ONNXRuntime 1.18.0

skottmckay · 2024-07-23T10:34:43Z

Should be possible to add to this list given the ONNX spec for Flatten allows 8-bit integers:

onnxruntime/onnxruntime/core/optimizer/qdq_transformer/selectors_actions/qdq_selector_action_transformer.cc

Lines 63 to 68 in 0f1f3b7

    
           qdq_selector_action_registry.RegisterSelectorAndAction(drop_action_name, 
        
                                                                  {{"Gather", {}}, 
        
                                                                   {"Reshape", {}}, 
        
                                                                   {"Transpose", {}}, 
        
                                                                   {"Squeeze", {}}, 
        
                                                                   {"Unsqueeze", {}}},

CPU EP supports 8-bit data and the ops are data movement only. Makes handling of ops in https://github.com/microsoft/onnxruntime/blob/2580d935cbecd756cef435fb173a2f10237e9d2a/onnxruntime/core/optimizer/qdq_transformer/selectors_actions/shared/utils.cc#L34-L44 consistent. #21375

### Description Extends the Drop QDQ optimization to remove DequantizeLinear and QuantizeLinear nodes from around operators: - Flatten - Expand - Tile - Slice - GatherElements - ReduceMin - ReduceMax ### Motivation and Context To reduce floating-point conversions in quantize inference. Mainly motivated by the Flatten case, since that will show up in graphs exported from PyTorch to ONNX. But to make the change complete, extending to a larger set of ops for which this optimization is valid. #21375 --------- Co-authored-by: Edward Chen <[email protected]>

github-actions bot added the quantization issues related to quantization label Jul 16, 2024

mcollinswisc mentioned this issue Jul 16, 2024

Drop QDQ around more nodes #21376

Merged

sophies927 added the feature request request for unsupported feature or enhancement label Jul 18, 2024

skottmckay mentioned this issue Jul 24, 2024

Add QDQ handling for Expand/Flatten/Tile if CPU EP is taking them #21473

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature Request] Missing optimization of DequantizeLinear ∘ Flatten ∘ QuantizeLinear? #21375

[Feature Request] Missing optimization of DequantizeLinear ∘ Flatten ∘ QuantizeLinear? #21375

mcollinswisc commented Jul 16, 2024

skottmckay commented Jul 23, 2024

[Feature Request] Missing optimization of DequantizeLinear ∘ Flatten ∘ QuantizeLinear? #21375

[Feature Request] Missing optimization of DequantizeLinear ∘ Flatten ∘ QuantizeLinear? #21375

Comments

mcollinswisc commented Jul 16, 2024

Discussed in #21167

skottmckay commented Jul 23, 2024