Skip to content

Commit cadd6b2

Browse files
authored
[Spec] clarifications to Quant op spec
* scale, zeropt can be either scalar or tensor with matching number of dimensions for e.g. channel-wise quantization. * bitwidth may be specified as float32 for convenience, but must still represent a positive integer.
1 parent c966b46 commit cadd6b2

File tree

1 file changed

+9
-7
lines changed

1 file changed

+9
-7
lines changed

docs/qonnx-custom-ops/quant_op.md

Lines changed: 9 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,9 @@
11
### <a name="Quant"></a><a name="abs">**Quant**</a>
22

33
Calculates the quantized values of one input data (Tensor<T>) and produces one output data (Tensor<T>).
4-
Additionally, takes three floats as input, which define the scale, zero-point and bit-width of the quantization.
4+
Additionally, takes three floats as input, which define the scale, zero-point and bit-width of the quantization,
5+
which may be scalars or tensors with number of dimensions equal to the input data tensor, for e.g. tensor-wise
6+
or channel-wise quantization.
57
The attributes narrow and signed define how the bits of the quantization are interpreted, while the attribute
68
rounding_mode defines how quantized values are rounded.
79

@@ -27,12 +29,12 @@ This operator is not part of the ONNX standard and is not currently versioned.
2729
<dl>
2830
<dt><tt>X</tt> (differentiable) : tensor(float32)</dt>
2931
<dd>input tensor to quantize</dd>
30-
<dt><tt>scale</tt> : float32</dt>
31-
<dd>The scale factor</dd>
32-
<dt><tt>zeropt</tt> : float32</dt>
33-
<dd>The zero-point</dd>
34-
<dt><tt>bitwidth</tt> : int32</dt>
35-
<dd>The number of bits used by the quantization</dd>
32+
<dt><tt>scale</tt> : float32, tensor(float32)</dt>
33+
<dd>The scale factor, either as a global scalar or with a shape matching the number of dimensions of the X tensor</dd>
34+
<dt><tt>zeropt</tt> : float32, tensor(float32) </dt>
35+
<dd>The zero-point, either as a global scalar or with a shape matching the number of dimensions of the X tensor</dd>
36+
<dt><tt>bitwidth</tt> : int32, float32</dt>
37+
<dd>The number of bits used by the quantization, must be a positive integer. If float32 dtype is used for convenience, it must still represent an positive integer number of bits.</dd>
3638
</dl>
3739

3840

0 commit comments

Comments
 (0)