Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug] The per_tensor quantized weight type of matmul is wrong #21346

Closed
duanshengliu opened this issue Jul 13, 2024 · 1 comment
Closed

[Bug] The per_tensor quantized weight type of matmul is wrong #21346

duanshengliu opened this issue Jul 13, 2024 · 1 comment
Labels
quantization issues related to quantization

Comments

@duanshengliu
Copy link
Contributor

duanshengliu commented Jul 13, 2024

Describe the issue

I am using quantize_static for quantization, and I found the weight type of MatMul is wrong when the activation type is different from the weight type in per_tensor mode . I have located the relevant lines of code as follows:
https://github.com/microsoft/onnxruntime/blob/main/onnxruntime/python/tools/quantization/operators/matmul.py#L225C1-L228C71

  if is_per_channel:
      self.quantizer.quantize_weight_tensor_per_channel(tensor_name, channel_axis)
  else:
      self.quantizer.quantize_activation_tensor(tensor_name)

Apparently, there is a lack of handling for weights in per_tensor mode here, which results in the weight type and activation type being the same for the per_tensor quantized MatMul operator.

To reproduce

The issue can be reproduced by using the relevant files in demo.zip. The reproduction commands are as follows,

python run.py --weight_type int8 --activation_type int16 --input_model demo.onnx --output_model demo_quant.onnx --calibrate_dataset ./test_images/

which will produce a quantized model with 16-bit weights of matmul. ❌

python run.py --weight_type int16 --activation_type int8 --input_model demo.onnx --output_model demo_quant.onnx --calibrate_dataset ./test_images/

which will produce a quantized model with 8-bit weights of matmul. ❌

Urgency

No response

Platform

Linux

OS Version

Ubuntu 22.04

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.18.1

ONNX Runtime API

Python

Architecture

X64

Execution Provider

Default CPU

Execution Provider Library Version

No response

@duanshengliu
Copy link
Contributor Author

A Fix PR here: #21347

adrianlizarraga pushed a commit that referenced this issue Aug 9, 2024
### Description
<!-- Describe your changes. -->
Fix wrong per-tensor quantized weight type for matmul.


### Motivation and Context
<!-- - Why is this change required? What problem does it solve?
- If it fixes an open issue, please link to the issue here. -->
Fix related bug as described in
#21346
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
quantization issues related to quantization
Projects
None yet
Development

No branches or pull requests

1 participant