Skip to content

Latest commit

 

History

History
45 lines (37 loc) · 2.48 KB

backend_quant.md

File metadata and controls

45 lines (37 loc) · 2.48 KB

Quantization Support Matrix

This document provides a quantization support matrix for the following frameworks listed below:

Framework Backend Library Symmetric Quantization Asymmetric Quantization
TensorFlow oneDNN Activation (int8/uint8), Weight (int8) -
PyTorch FBGEMM Activation (uint8), Weight (int8) Activation (uint8)
PyTorch IPEX oneDNN Activation (int8/uint8), Weight (int8) -
MXNet oneDNN Activation (int8/uint8), Weight (int8) -
ONNX Runtime MLAS Weight (int8) Activation (uint8)

TensorFlow

  • Symmetric Quantization
    • int8: scale = 2 * max(abs(rmin), abs(rmax)) / (max(int8) - min(int8) - 1)
    • uint8: scale = max(rmin, rmax) / (max(uint8) - min(uint8))

PyTorch

  • Symmetric Quantization
    • int8: scale = max(abs(rmin), abs(rmax)) / (float(max(int8) - min(int8)) / 2)
    • uint8: scale = max(abs(rmin), abs(rmax)) / (float(max(int8) - min(int8)) / 2)
  • Asymmetric Quantization
    • uint8: scale = (rmax - rmin) / (max(uint8) - min(uint8)); zero_point = min(uint8) - round(rmin / scale)

PyTorch IPEX

  • Symmetric Quantization
    • int8: scale = 2 * max(abs(rmin), abs(rmax)) / (max(int8) - min(int8) - 1)
    • uint8: scale = max(rmin, rmax) / (max(uint8) - min(uint8))

MXNet

  • Symmetric Quantization
    • int8: scale = 2 * max(abs(rmin), abs(rmax)) / (max(int8) - min(int8) - 1)
    • uint8: scale = max(rmin, rmax) / (max(uint8) - min(uint8))

ONNX Runtime

  • Symmetric Quantization
    • int8: scale = 2 * max(abs(rmin), abs(rmax)) / (max(int8) - min(int8) - 1)
  • Asymmetric Quantization
    • uint8: scale = (rmax - rmin) / (max(uint8) - min(uint8)); zero_point = min(uint8) - round(rmin / scale)

Reference