Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to convert quantized ONNX model from Tensor-Oriented format to Operator-Oriented format? #21137

Closed
hoangtv2000 opened this issue Jun 21, 2024 · 5 comments
Labels
quantization issues related to quantization stale issues that have not been addressed in a while; categorized by a bot

Comments

@hoangtv2000
Copy link

hoangtv2000 commented Jun 21, 2024

Describe the issue

I have the quantized model represented as the graph below and I want to convert all of QDQ operators in this model to QOperator operators, what should I do?
image

To reproduce

have not repoduced yet.

Urgency

No response

Platform

Linux

OS Version

Ubuntu 22.04

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

1.15.1

ONNX Runtime API

Python

Architecture

X64

Execution Provider

Default CPU

Execution Provider Library Version

No response

Tasks

Preview Give feedback
No tasks being tracked yet.
@carzh carzh added the quantization issues related to quantization label Jun 24, 2024
Copy link
Contributor

This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.

@github-actions github-actions bot added the stale issues that have not been addressed in a while; categorized by a bot label Jul 25, 2024
@hoangtv2000 hoangtv2000 closed this as not planned Won't fix, can't repro, duplicate, stale Jul 29, 2024
@UsingtcNower
Copy link

UsingtcNower commented Aug 15, 2024

I have the same question. Hi @hoangtv2000 do you find the solution?

@yufenglee yufenglee reopened this Aug 15, 2024
@yufenglee
Copy link
Member

@UsingtcNower and @hoangtv2000, you can run the onnxruntime offline-mode to optimize the model to Operator oriented: https://onnxruntime.ai/docs/performance/model-optimizations/graph-optimizations.html#offline-mode. BTW, why do you want to do this?

@hoangtv2000
Copy link
Author

@UsingtcNower and @hoangtv2000, you can run the onnxruntime offline-mode to optimize the model to Operator oriented: https://onnxruntime.ai/docs/performance/model-optimizations/graph-optimizations.html#offline-mode. BTW, why do you want to do this?

I want to use your method to run on our self-designed chip which runs efficiently by integer arithmetic-only format.

@hoangtv2000
Copy link
Author

I have the same question. Hi @hoangtv2000 do you find the solution?

You should use built-in static post-training quantization function of onnxruntime library quantize_static, remember to set quant_format to QuantFormat.QOperator if you want your ouput model has only QOperator, set per_channel to False if you want per-tensor quantization and activation_type & weight_type to QuantType.QUInt8 if you want your weight and activation tensors in uint8 format.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
quantization issues related to quantization stale issues that have not been addressed in a while; categorized by a bot
Projects
None yet
Development

No branches or pull requests

4 participants