[QNN QDQ Quant] Utils to generate mixed-precision quant overrides #20028

adrianlizarraga · 2024-03-22T08:48:49Z

Description

Adds a utility to the QNN quantization scripts that "fixes" an initial set of tensor quantization overrides for mixed-precision QDQ models. Follow-up to [QDQ Quant] Support mixed-precision integer quantization via overrides #19925
Moves existing overrides for QNN compatibility (matmul, layernorm, sigmoid, tanh) to separate functions. PR adds missing unit tests for these.
Adds weight_symmetric=None parameter to the get_qnn_qdq_config() function to enable user specification (instead of always using default behavior).
- If weight_symmetric is set to None, it will be set to weight_symmetric = weight_type in (QUInt8, QUInt16).
- Otherwise, the user's value is used.

Example

Float model:

    input_0 --> Op1 --> Op3 --> Op5 --> Op6 --> output_0
                                 ^
                                 |
    input_1 --> Op2 -+-> Op4 ----+
                     |
                     +-> Op7 --> output_1
                     |
                     +-> Op8 --> output_2

If we'd like to quantize this model to uint8 precision, but would like to make sure tensor "Op4_out" is quantized to 16-bit, then we would specify the following initial tensor quantization overrides:

# Op4_out could be an inaccurate tensor that should be upgraded to 16bit
initial_overrides = {"Op4_out": [{"quant_type": QuantType.QUInt16}]}

These initial overrides may not create a valid model because Op4 and Op5 may require both the input and output to be the same type (e.g., uint16). This helper fixes the overrides so that input/output data types are valid:

qnn_config = get_qnn_qdq_config(
    float_model_path,
    data_reader,
    activation_type=QuantType.QUInt8,
    weight_type=QuantType.QUInt8,
    init_overrides=initial_overrides,  # These initial overrides will be "fixed"
)

The above snippet generates the following "fixed" overrides (get via qnn_config.extra_options["TensorQuantOverrides"]):

    {
      "Op2_out": [{"quant_type": QUInt8, "convert": {"quant_type": QUInt16, "recv_nodes": {"Op4"}}}],
      "Op3_out": [{"quant_type": QUInt8, "convert": {"quant_type": QUInt16, "recv_nodes": {"Op5"}}}],
      "Op4_out": [{"quant_type": QUInt16}],
      "Op5_out": [{"quant_type": QUInt16, "convert": {"quant_type": QUInt8, "recv_nodes": {"Op6"}}}]
    }

How to interpret the fixed overrides:

Op2's output is consumed by Op4, Op7, and Op8. Op4 consumes the converted u16 type, but Op7 and Op8 consume the original u8 type.
Op3's output is converted from u8 to u16. Op5 consumes the converted u16 type.
Op4's output is just u16 (not converted). All consumers of Op4_out get the u16 type.
Op5's output is converted from u16 to u8. Op6 consumes the u8 type.

Motivation and Context

Generating mixed-precision quantization overrides is currently a manual process. This PR adds an utility that helps generate valid overrides.

…q-mixed-prec-overrides-utils

jywu-msft

thanks!

…crosoft#20028) ### Description - Adds a utility to the QNN quantization scripts that "fixes" an initial set of tensor quantization overrides for mixed-precision QDQ models. Follow-up to microsoft#19925 - Moves existing overrides for QNN compatibility (matmul, layernorm, sigmoid, tanh) to separate functions. PR adds missing unit tests for these. - Adds `weight_symmetric=None` parameter to the `get_qnn_qdq_config()` function to enable user specification (instead of always using default behavior). - If weight_symmetric is set to `None`, it will be set to `weight_symmetric = weight_type in (QUInt8, QUInt16)`. - Otherwise, the user's value is used. #### Example Float model: ``` input_0 --> Op1 --> Op3 --> Op5 --> Op6 --> output_0 ^ | input_1 --> Op2 -+-> Op4 ----+ | +-> Op7 --> output_1 | +-> Op8 --> output_2 ``` If we'd like to quantize this model to uint8 precision, but would like to make sure tensor "Op4_out" is quantized to 16-bit, then we would specify the following initial tensor quantization overrides: ```python # Op4_out could be an inaccurate tensor that should be upgraded to 16bit initial_overrides = {"Op4_out": [{"quant_type": QuantType.QUInt16}]} ``` These initial overrides may not create a valid model because Op4 and Op5 may require both the input and output to be the same type (e.g., uint16). This helper fixes the overrides so that input/output data types are valid: ```python qnn_config = get_qnn_qdq_config( float_model_path, data_reader, activation_type=QuantType.QUInt8, weight_type=QuantType.QUInt8, init_overrides=initial_overrides, # These initial overrides will be "fixed" ) ``` The above snippet generates the following "fixed" overrides (get via `qnn_config.extra_options["TensorQuantOverrides"]`): ```python { "Op2_out": [{"quant_type": QUInt8, "convert": {"quant_type": QUInt16, "recv_nodes": {"Op4"}}}], "Op3_out": [{"quant_type": QUInt8, "convert": {"quant_type": QUInt16, "recv_nodes": {"Op5"}}}], "Op4_out": [{"quant_type": QUInt16}], "Op5_out": [{"quant_type": QUInt16, "convert": {"quant_type": QUInt8, "recv_nodes": {"Op6"}}}] } ``` How to interpret the fixed overrides: - Op2's output is consumed by Op4, Op7, and Op8. Op4 consumes the converted u16 type, but Op7 and Op8 consume the original u8 type. - Op3's output is converted from u8 to u16. Op5 consumes the converted u16 type. - Op4's output is just u16 (not converted). All consumers of Op4_out get the u16 type. - Op5's output is converted from u16 to u8. Op6 consumes the u8 type. ### Motivation and Context Generating mixed-precision quantization overrides is currently a manual process. This PR adds an utility that helps generate valid overrides.

adrianlizarraga added 20 commits March 14, 2024 17:21

Support mixed-precision QDQ models via quantization overrides

18d0c1e

linter

b260b75

Remove return after raise

caf9c7d

Add mixed precision qdq tests that use float16

08a5557

Merge branch 'main' into adrianl/qdq-quant-mixed-prec-overrides

0d945cf

Remove unused method

6c6e4eb

Merge branch 'main' into adrianl/qdq-quant-mixed-prec-overrides

e9e9c0e

Merge branch 'main' into adrianl/qdq-quant-mixed-prec-overrides

34e522f

Merge branch 'main' into adrianl/qdq-quant-mixed-prec-overrides

4a03f77

Merge branch 'main' into adrianl/qdq-mixed-prec-overrides-utils

b5e63a9

Add utils for mixed-precision overrides

4995471

Lintrunner

eb9996d

Add comments and clean up

242b5de

Merge branch 'main' into adrianl/qdq-quant-mixed-prec-overrides

ae3f3a3

Merge branch 'main' into adrianl/qdq-mixed-prec-overrides-utils

819fac5

Merge branch 'adrianl/qdq-quant-mixed-prec-overrides' into adrianl/qd…

34f6615

…q-mixed-prec-overrides-utils

Build op_types in qnn.quant_config

1c25c40

Test qnn_config.op_types_to_quantize

b372019

Add unit tests for qnn compatibility overrides

b8968b9

Clean up

ca574e5

adrianlizarraga requested review from jywu-msft and HectorSVC March 22, 2024 08:49

adrianlizarraga added 2 commits March 22, 2024 02:03

linter

58b3922

Fix misspelling

29b1ecd

Base automatically changed from adrianl/qdq-quant-mixed-prec-overrides to main March 23, 2024 18:05

Merge main branch

f173914

adrianlizarraga marked this pull request as ready for review March 23, 2024 18:34

jywu-msft approved these changes Mar 25, 2024

View reviewed changes

adrianlizarraga merged commit 7d976cf into main Mar 25, 2024
93 of 95 checks passed

adrianlizarraga deleted the adrianl/qdq-mixed-prec-overrides-utils branch March 25, 2024 21:41

jywu-msft mentioned this pull request May 28, 2024

[QDQ Quantization] Refactor shared functionality into a base quantizer #19817

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[QNN QDQ Quant] Utils to generate mixed-precision quant overrides #20028

[QNN QDQ Quant] Utils to generate mixed-precision quant overrides #20028

adrianlizarraga commented Mar 22, 2024 •

edited

Loading

jywu-msft left a comment

[QNN QDQ Quant] Utils to generate mixed-precision quant overrides #20028

[QNN QDQ Quant] Utils to generate mixed-precision quant overrides #20028

Conversation

adrianlizarraga commented Mar 22, 2024 • edited Loading

Description

Example

Motivation and Context

jywu-msft left a comment

Choose a reason for hiding this comment

adrianlizarraga commented Mar 22, 2024 •

edited

Loading