[Bug] The per_tensor quantized weight type of matmul is wrong #21346

duanshengliu · 2024-07-13T09:59:57Z

Describe the issue

I am using quantize_static for quantization, and I found the weight type of MatMul is wrong when the activation type is different from the weight type in per_tensor mode . I have located the relevant lines of code as follows:
https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/python/tools/quantization/operators/matmul.py#L225C1-L228C71

  if is_per_channel:
      self.quantizer.quantize_weight_tensor_per_channel(tensor_name, channel_axis)
  else:
      self.quantizer.quantize_activation_tensor(tensor_name)

Apparently, there is a lack of handling for weights in per_tensor mode here, which results in the weight type and activation type being the same for the per_tensor quantized MatMul operator.

To reproduce

The issue can be reproduced by using the relevant files in demo.zip. The reproduction commands are as follows,

python run.py --weight_type int8 --activation_type int16 --input_model demo.onnx --output_model demo_quant.onnx --calibrate_dataset ./test_images/

which will produce a quantized model with 16-bit weights of matmul. ❌

python run.py --weight_type int16 --activation_type int8 --input_model demo.onnx --output_model demo_quant.onnx --calibrate_dataset ./test_images/

which will produce a quantized model with 8-bit weights of matmul. ❌

Urgency

No response

Platform

Linux

OS Version

Ubuntu 22.04

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.18.1

ONNX Runtime API

Python

Architecture

X64

Execution Provider

Default CPU

Execution Provider Library Version

No response

The text was updated successfully, but these errors were encountered:

duanshengliu · 2024-07-13T10:06:02Z

A Fix PR here: #21347

### Description  Fix wrong per-tensor quantized weight type for matmul. ### Motivation and Context  Fix related bug as described in #21346

github-actions bot added the quantization issues related to quantization label Jul 13, 2024

duanshengliu mentioned this issue Jul 13, 2024

Fix wrong per-tensor quantized weight type for matmul #21347

Merged

duanshengliu closed this as completed Aug 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug] The per_tensor quantized weight type of matmul is wrong #21346

[Bug] The per_tensor quantized weight type of matmul is wrong #21346

duanshengliu commented Jul 13, 2024 •

edited

Loading

duanshengliu commented Jul 13, 2024

[Bug] The per_tensor quantized weight type of matmul is wrong #21346

[Bug] The per_tensor quantized weight type of matmul is wrong #21346

Comments

duanshengliu commented Jul 13, 2024 • edited Loading

Describe the issue

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

duanshengliu commented Jul 13, 2024

duanshengliu commented Jul 13, 2024 •

edited

Loading