How to convert quantized ONNX model from Tensor-Oriented format to Operator-Oriented format? #21137

hoangtv2000 · 2024-06-21T09:26:54Z

Describe the issue

I have the quantized model represented as the graph below and I want to convert all of QDQ operators in this model to QOperator operators, what should I do?

To reproduce

have not repoduced yet.

Urgency

No response

Platform

Linux

OS Version

Ubuntu 22.04

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

1.15.1

ONNX Runtime API

Python

Architecture

X64

Execution Provider

Default CPU

Execution Provider Library Version

No response

Tasks

Give feedback

No tasks being tracked yet.

Options

github-actions · 2024-07-25T15:00:57Z

This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.

UsingtcNower · 2024-08-15T16:13:14Z

I have the same question. Hi @hoangtv2000 do you find the solution?

yufenglee · 2024-08-15T16:37:58Z

@UsingtcNower and @hoangtv2000, you can run the onnxruntime offline-mode to optimize the model to Operator oriented: https://onnxruntime.ai/docs/performance/model-optimizations/graph-optimizations.html#offline-mode. BTW, why do you want to do this?

hoangtv2000 · 2024-08-17T11:40:01Z

@UsingtcNower and @hoangtv2000, you can run the onnxruntime offline-mode to optimize the model to Operator oriented: https://onnxruntime.ai/docs/performance/model-optimizations/graph-optimizations.html#offline-mode. BTW, why do you want to do this?

I want to use your method to run on our self-designed chip which runs efficiently by integer arithmetic-only format.

hoangtv2000 · 2024-08-17T11:48:40Z

I have the same question. Hi @hoangtv2000 do you find the solution?

You should use built-in static post-training quantization function of onnxruntime library quantize_static, remember to set quant_format to QuantFormat.QOperator if you want your ouput model has only QOperator, set per_channel to False if you want per-tensor quantization and activation_type & weight_type to QuantType.QUInt8 if you want your weight and activation tensors in uint8 format.

carzh added the quantization issues related to quantization label Jun 24, 2024

github-actions bot added the stale issues that have not been addressed in a while; categorized by a bot label Jul 25, 2024

hoangtv2000 closed this as not planned Won't fix, can't repro, duplicate, stale Jul 29, 2024

yufenglee reopened this Aug 15, 2024

yufenglee mentioned this issue Aug 15, 2024

additional gains from QDQ #11260

Closed

hoangtv2000 closed this as completed Aug 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to convert quantized ONNX model from Tensor-Oriented format to Operator-Oriented format? #21137

How to convert quantized ONNX model from Tensor-Oriented format to Operator-Oriented format? #21137

hoangtv2000 commented Jun 21, 2024 •

edited

Loading

Tasks

github-actions bot commented Jul 25, 2024

UsingtcNower commented Aug 15, 2024 •

edited

Loading

yufenglee commented Aug 15, 2024

hoangtv2000 commented Aug 17, 2024

hoangtv2000 commented Aug 17, 2024

How to convert quantized ONNX model from Tensor-Oriented format to Operator-Oriented format? #21137

How to convert quantized ONNX model from Tensor-Oriented format to Operator-Oriented format? #21137

Comments

hoangtv2000 commented Jun 21, 2024 • edited Loading

Describe the issue

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

Tasks

github-actions bot commented Jul 25, 2024

UsingtcNower commented Aug 15, 2024 • edited Loading

yufenglee commented Aug 15, 2024

hoangtv2000 commented Aug 17, 2024

hoangtv2000 commented Aug 17, 2024

hoangtv2000 commented Jun 21, 2024 •

edited

Loading

UsingtcNower commented Aug 15, 2024 •

edited

Loading