Skip to content

Commit

Permalink
[Spec] clarifications to Quant op spec
Browse files Browse the repository at this point in the history
* scale, zeropt can be either scalar or tensor with matching number of dimensions for e.g. channel-wise quantization.
* bitwidth may be specified as float32 for convenience, but must still represent a positive integer.
  • Loading branch information
maltanar authored Oct 23, 2023
1 parent c966b46 commit cadd6b2
Showing 1 changed file with 9 additions and 7 deletions.
16 changes: 9 additions & 7 deletions docs/qonnx-custom-ops/quant_op.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
### <a name="Quant"></a><a name="abs">**Quant**</a>

Calculates the quantized values of one input data (Tensor<T>) and produces one output data (Tensor<T>).
Additionally, takes three floats as input, which define the scale, zero-point and bit-width of the quantization.
Additionally, takes three floats as input, which define the scale, zero-point and bit-width of the quantization,
which may be scalars or tensors with number of dimensions equal to the input data tensor, for e.g. tensor-wise
or channel-wise quantization.
The attributes narrow and signed define how the bits of the quantization are interpreted, while the attribute
rounding_mode defines how quantized values are rounded.

Expand All @@ -27,12 +29,12 @@ This operator is not part of the ONNX standard and is not currently versioned.
<dl>
<dt><tt>X</tt> (differentiable) : tensor(float32)</dt>
<dd>input tensor to quantize</dd>
<dt><tt>scale</tt> : float32</dt>
<dd>The scale factor</dd>
<dt><tt>zeropt</tt> : float32</dt>
<dd>The zero-point</dd>
<dt><tt>bitwidth</tt> : int32</dt>
<dd>The number of bits used by the quantization</dd>
<dt><tt>scale</tt> : float32, tensor(float32)</dt>
<dd>The scale factor, either as a global scalar or with a shape matching the number of dimensions of the X tensor</dd>
<dt><tt>zeropt</tt> : float32, tensor(float32) </dt>
<dd>The zero-point, either as a global scalar or with a shape matching the number of dimensions of the X tensor</dd>
<dt><tt>bitwidth</tt> : int32, float32</dt>
<dd>The number of bits used by the quantization, must be a positive integer. If float32 dtype is used for convenience, it must still represent an positive integer number of bits.</dd>
</dl>


Expand Down

0 comments on commit cadd6b2

Please sign in to comment.