Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QNN QDQ Quant] Utils to generate mixed-precision quant overrides #20028

Merged
merged 23 commits into from
Mar 25, 2024

Conversation

adrianlizarraga
Copy link
Contributor

@adrianlizarraga adrianlizarraga commented Mar 22, 2024

Description

  • Adds a utility to the QNN quantization scripts that "fixes" an initial set of tensor quantization overrides for mixed-precision QDQ models. Follow-up to [QDQ Quant] Support mixed-precision integer quantization via overrides #19925
  • Moves existing overrides for QNN compatibility (matmul, layernorm, sigmoid, tanh) to separate functions. PR adds missing unit tests for these.
  • Adds weight_symmetric=None parameter to the get_qnn_qdq_config() function to enable user specification (instead of always using default behavior).
    • If weight_symmetric is set to None, it will be set to weight_symmetric = weight_type in (QUInt8, QUInt16).
    • Otherwise, the user's value is used.

Example

Float model:

    input_0 --> Op1 --> Op3 --> Op5 --> Op6 --> output_0
                                 ^
                                 |
    input_1 --> Op2 -+-> Op4 ----+
                     |
                     +-> Op7 --> output_1
                     |
                     +-> Op8 --> output_2

If we'd like to quantize this model to uint8 precision, but would like to make sure tensor "Op4_out" is quantized to 16-bit, then we would specify the following initial tensor quantization overrides:

# Op4_out could be an inaccurate tensor that should be upgraded to 16bit
initial_overrides = {"Op4_out": [{"quant_type": QuantType.QUInt16}]}

These initial overrides may not create a valid model because Op4 and Op5 may require both the input and output to be the same type (e.g., uint16). This helper fixes the overrides so that input/output data types are valid:

qnn_config = get_qnn_qdq_config(
    float_model_path,
    data_reader,
    activation_type=QuantType.QUInt8,
    weight_type=QuantType.QUInt8,
    init_overrides=initial_overrides,  # These initial overrides will be "fixed"
)

The above snippet generates the following "fixed" overrides (get via qnn_config.extra_options["TensorQuantOverrides"]):

    {
      "Op2_out": [{"quant_type": QUInt8, "convert": {"quant_type": QUInt16, "recv_nodes": {"Op4"}}}],
      "Op3_out": [{"quant_type": QUInt8, "convert": {"quant_type": QUInt16, "recv_nodes": {"Op5"}}}],
      "Op4_out": [{"quant_type": QUInt16}],
      "Op5_out": [{"quant_type": QUInt16, "convert": {"quant_type": QUInt8, "recv_nodes": {"Op6"}}}]
    }

How to interpret the fixed overrides:

  • Op2's output is consumed by Op4, Op7, and Op8. Op4 consumes the converted u16 type, but Op7 and Op8 consume the original u8 type.
  • Op3's output is converted from u8 to u16. Op5 consumes the converted u16 type.
  • Op4's output is just u16 (not converted). All consumers of Op4_out get the u16 type.
  • Op5's output is converted from u16 to u8. Op6 consumes the u8 type.

Motivation and Context

Generating mixed-precision quantization overrides is currently a manual process. This PR adds an utility that helps generate valid overrides.

Base automatically changed from adrianl/qdq-quant-mixed-prec-overrides to main March 23, 2024 18:05
@adrianlizarraga adrianlizarraga marked this pull request as ready for review March 23, 2024 18:34
Copy link
Member

@jywu-msft jywu-msft left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks!

@adrianlizarraga adrianlizarraga merged commit 7d976cf into main Mar 25, 2024
93 of 95 checks passed
@adrianlizarraga adrianlizarraga deleted the adrianl/qdq-mixed-prec-overrides-utils branch March 25, 2024 21:41
TedThemistokleous pushed a commit to TedThemistokleous/onnxruntime that referenced this pull request May 7, 2024
…crosoft#20028)

### Description
- Adds a utility to the QNN quantization scripts that "fixes" an initial
set of tensor quantization overrides for mixed-precision QDQ models.
Follow-up to microsoft#19925
- Moves existing overrides for QNN compatibility (matmul, layernorm,
sigmoid, tanh) to separate functions. PR adds missing unit tests for
these.
- Adds `weight_symmetric=None` parameter to the `get_qnn_qdq_config()`
function to enable user specification (instead of always using default
behavior).
- If weight_symmetric is set to `None`, it will be set to
`weight_symmetric = weight_type in (QUInt8, QUInt16)`.
  - Otherwise, the user's value is used.

#### Example
Float model:

```
    input_0 --> Op1 --> Op3 --> Op5 --> Op6 --> output_0
                                 ^
                                 |
    input_1 --> Op2 -+-> Op4 ----+
                     |
                     +-> Op7 --> output_1
                     |
                     +-> Op8 --> output_2
```

If we'd like to quantize this model to uint8 precision, but would like
to make sure tensor "Op4_out" is quantized to 16-bit, then we would
specify the following initial tensor quantization overrides:
```python
# Op4_out could be an inaccurate tensor that should be upgraded to 16bit
initial_overrides = {"Op4_out": [{"quant_type": QuantType.QUInt16}]}
```

These initial overrides may not create a valid model because Op4 and Op5
may require both the input and output to be the same type (e.g.,
uint16). This helper fixes the overrides so that input/output data types
are valid:

```python
qnn_config = get_qnn_qdq_config(
    float_model_path,
    data_reader,
    activation_type=QuantType.QUInt8,
    weight_type=QuantType.QUInt8,
    init_overrides=initial_overrides,  # These initial overrides will be "fixed"
)
```

The above snippet generates the following "fixed" overrides (get via
`qnn_config.extra_options["TensorQuantOverrides"]`):
```python
    {
      "Op2_out": [{"quant_type": QUInt8, "convert": {"quant_type": QUInt16, "recv_nodes": {"Op4"}}}],
      "Op3_out": [{"quant_type": QUInt8, "convert": {"quant_type": QUInt16, "recv_nodes": {"Op5"}}}],
      "Op4_out": [{"quant_type": QUInt16}],
      "Op5_out": [{"quant_type": QUInt16, "convert": {"quant_type": QUInt8, "recv_nodes": {"Op6"}}}]
    }
```

How to interpret the fixed overrides:
- Op2's output is consumed by Op4, Op7, and Op8. Op4 consumes the
converted u16 type, but Op7 and Op8 consume the original u8 type.
- Op3's output is converted from u8 to u16. Op5 consumes the converted
u16 type.
- Op4's output is just u16 (not converted). All consumers of Op4_out get
the u16 type.
- Op5's output is converted from u16 to u8. Op6 consumes the u8 type.

### Motivation and Context
Generating mixed-precision quantization overrides is currently a manual
process. This PR adds an utility that helps generate valid overrides.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants