Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is there any way to retrieve Quantization type and Quantization parameters using onnxruntime ? #19916

Open
OAHLSTM opened this issue Mar 14, 2024 · 4 comments
Labels
quantization issues related to quantization stale issues that have not been addressed in a while; categorized by a bot

Comments

@OAHLSTM
Copy link

OAHLSTM commented Mar 14, 2024

Describe the issue

Hello,
I'm trying to get quantization parameters from an input tensor such as the quantization type (Static Linear per tensor/ Static linear per channel/ dynamic) and the associated quantization parameter (scales & zero_points).
In tensorflow-lite, we are able to check if the model is quantized statically per-tensor or per-channel by simply doing:

const TfLiteQuantizationType tflite_qtype = tensor->quantization.type;
switch (tflite_qtype) {
            case TfLiteQuantizationType::kTfLiteAffineQuantization:
            {
                const auto* quantization_params = reinterpret_cast<const TfLiteAffineQuantization*>(tensor->quantization.params);
                if (quant_params->scale && quant_params->scale->size > 1) {
                      // per-channel quantization along the specified dimension
                     uint32_t quant_dim = quantization_params->quantized_dimension;
                     float* scales = quantization_params->scale->data;
                     int32_t* zero_points = quantization_params->zero_point->data;
                     break;
                } else  {
                     float scale = tensor->params.scale;
                     uint32_t zero_point = tensor->params.zero_point;
                }
          }
          case TfLiteQuantizationType::kTfLiteNoQuantization:
          default:
                std::cout << "stai_map_qtype: float or non supported quant type " << std::endl;

I was wondering if there are any ways to do similar quantization parameters retrieving using onnxruntime.
Thank you for your help.

To reproduce

Not applicable

Urgency

This is really urgent since we are migrating from tensorflow-lite to onnxruntime, and this feature is kind of crucial for our implementation.

Platform

Linux

OS Version

Ubuntu 22.04

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

1.15.1

ONNX Runtime API

C++

Architecture

X64

Execution Provider

Default CPU

Execution Provider Library Version

No response

@github-actions github-actions bot added the quantization issues related to quantization label Mar 14, 2024
@hariharans29
Copy link
Member

AFAIK our Tensor interface provides no interface to query such metadata. As for can it be ascertained at the model level, tagging @yufenglee as I am not sure about that.

@OAHLSTM
Copy link
Author

OAHLSTM commented Mar 21, 2024

Hello @yufenglee ,
Any update on the topic ?

Thank you for your support,

Copy link
Contributor

This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.

@github-actions github-actions bot added the stale issues that have not been addressed in a while; categorized by a bot label Apr 22, 2024
@OAHLSTM
Copy link
Author

OAHLSTM commented Oct 28, 2024

Hello guys,
Is there any update on this topic ? I'm really looking to get the quantizations parameters from my input and output tensors. I have noticed on the release 1.19.2 that the VSINPU and XNNPACK support retrieval of the Quantization parameters through their function:

void GetQuantizationScaleAndZeroPoint(
    const GraphViewer& graph_viewer, const NodeUnitIODef& io_def, const std::filesystem::path& model_path,
    float& scale, int32_t& zero_point, std::optional<std::vector<float>>& pcq_scales,
    std::optional<std::vector<int32_t>>& pcq_zps) 

and

std::pair<const onnx::TensorProto*, const onnx::TensorProto*>
GetQuantizationZeroPointAndScale(const GraphViewer& graphview,
                                 const NodeUnitIODef& io_def) 

Is there any way to access these parameters loaded by this function from the Tensor Interface ?
Thank you for you help

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
quantization issues related to quantization stale issues that have not been addressed in a while; categorized by a bot
Projects
None yet
Development

No branches or pull requests

2 participants