From cadd6b236e093f910c9e7cea623c81846cab3506 Mon Sep 17 00:00:00 2001 From: Yaman Umuroglu Date: Mon, 23 Oct 2023 23:16:31 +0200 Subject: [PATCH] [Spec] clarifications to Quant op spec * scale, zeropt can be either scalar or tensor with matching number of dimensions for e.g. channel-wise quantization. * bitwidth may be specified as float32 for convenience, but must still represent a positive integer. --- docs/qonnx-custom-ops/quant_op.md | 16 +++++++++------- 1 file changed, 9 insertions(+), 7 deletions(-) diff --git a/docs/qonnx-custom-ops/quant_op.md b/docs/qonnx-custom-ops/quant_op.md index 003be341..02d115fb 100644 --- a/docs/qonnx-custom-ops/quant_op.md +++ b/docs/qonnx-custom-ops/quant_op.md @@ -1,7 +1,9 @@ ### **Quant** Calculates the quantized values of one input data (Tensor) and produces one output data (Tensor). -Additionally, takes three floats as input, which define the scale, zero-point and bit-width of the quantization. +Additionally, takes three floats as input, which define the scale, zero-point and bit-width of the quantization, +which may be scalars or tensors with number of dimensions equal to the input data tensor, for e.g. tensor-wise +or channel-wise quantization. The attributes narrow and signed define how the bits of the quantization are interpreted, while the attribute rounding_mode defines how quantized values are rounded. @@ -27,12 +29,12 @@ This operator is not part of the ONNX standard and is not currently versioned.
X (differentiable) : tensor(float32)
input tensor to quantize
-
scale : float32
-
The scale factor
-
zeropt : float32
-
The zero-point
-
bitwidth : int32
-
The number of bits used by the quantization
+
scale : float32, tensor(float32)
+
The scale factor, either as a global scalar or with a shape matching the number of dimensions of the X tensor
+
zeropt : float32, tensor(float32)
+
The zero-point, either as a global scalar or with a shape matching the number of dimensions of the X tensor
+
bitwidth : int32, float32
+
The number of bits used by the quantization, must be a positive integer. If float32 dtype is used for convenience, it must still represent an positive integer number of bits.