Skip to content

Latest commit

 

History

History
140 lines (98 loc) · 7.11 KB

quantization.md

File metadata and controls

140 lines (98 loc) · 7.11 KB

Quantization, integer inference, and other stuff related to mobile deployment

Table of Contents

PTQ

  • Solving Oscillation Problem in Post-Training Quantization Through a Theoretical Perspective
    Yuexiao Ma, Huixia Li, Xiawu Zheng, Xuefeng Xiao, Rui Wang, Shilei Wen, Xin Pan, Fei Chao, Rongrong Ji
    [CVPR 2023]
    [MRECG]

  • Data Free Quantization Through Weight Equalization and Bias Correction
    Markus Nagel, Mart van Baalen, Tijmen Blankevoort, Max Welling
    [ICCV 2019] [Project] [Pytorch-Code]
    [CLE/BC] [★★]

QAT

  • Distance-aware Quantization
    Junghyup Lee, Dohyung Kim, Bumsub Ham
    [ICCV 2021]
    [DAQ]

  • Fully Quantized Image Super-Resolution Networks
    Hu Wang, Peng Chen, Bohan Zhuang, Chunhua Shen
    [MM 2021] [Pytorch-Code]
    [FQSR] [★☆]

  • Gradient ℓ1 Regularization
    Milad Alizadeh, Arash Behboodi, Mart van Baalen, Christos Louizos, Tijmen Blankevoort, Max Welling
    [ICLR 2020]
    [★☆] 高通出品. 提出加入梯度的L1正则, 减小量化误差.

  • PAMS: Quantized Super-Resolution via Parameterized Max Scale
    Huixia Li, Chenqian Yan, Shaohui Lin, Xiawu Zheng, Yuchao Li, Baochang Zhang, Fan Yang, Rongrong Ji
    [arXiv 1902]
    [★]

  • LSQ+: Improving low-bit quantization through learnable offsets and better initialization
    Yash Bhalgat, Jinwon Lee, Markus Nagel, Tijmen Blankevoort, Nojun Kwak
    [CVPR 2020] [Unofficial-Pytorch-Code]
    [LSQ+] [★★] 将LSQ推广至非对称量化, scale和offset均可学习

  • Learned Step Size Quantization
    Steven K. Esser, Jeffrey L. McKinstry, Deepika Bablani, Rathinakumar Appuswamy, Dharmendra S. Modha
    [arXiv 1902] [Unofficial-Pytorch-Code]
    [LSQ] [★★] 学习scale

  • ProxQuant: Quantized Neural Networks via Proximal Operators
    Yu Bai, Yu-Xiang Wang, Edo Liberty
    [ICLR 2019] [Pytorch-Code]

  • PACT: Parameterized Clipping Activation for Quantized Neural Networks
    Jungwook Choi, Zhuo Wang, Swagath Venkataramani, Pierce I-Jen Chuang, Vijayalakshmi Srinivasan, Kailash Gopalakrishnan
    [ICLR 2018] [Pytorch-Code]
    [★☆] Intel出品. 提出对ReLU的上限加一个可学习的截断, 使qat时网络能自动找到更好的clip range.

  • On periodic functions as regularizers for quantization of neural networks
    Maxim Naumov, Utku Diril, Jongsoo Park, Benjamin Ray, Jedrzej Jablonski, Andrew Tulloch
    [arXiv 1811]
    [★☆] Facebook出品

  • Learning Sparse Low-Precision Neural Networks With Learnable Regularization
    Yoojin Choi, Mostafa El-Khamy, Jungwon Lee
    [arXiv 1809]
    三星出品

  • LQ-Nets: Learned Quantization for Highly Accurate and Compact Deep Neural Networks
    Dongqing Zhang, Jiaolong Yang, Dongqiangzi Ye, Gang Hua
    [ECCV 2018] [TF-Code]
    MSRA出品

  • Towards Effective Low-bitwidth Convolutional Neural Networks
    Bohan Zhuang, Chunhua Shen, Mingkui Tan, Lingqiao Liu, Ian Reid
    [CVPR 2018] [Pytorch-Code]
    [★] 提出3个trick: 1) 分两阶段, 先量化weight, 再量化act; 2) 逐渐降低比特数, 对2bit量化等情况可能有效; 3) 用float模型对quant模型做feature的蒸馏

Inference

  • Towards Fully 8-bit Integer Inference for the Transformer Model
    Ye Lin, Yanyang Li, Tengbo Liu, Tong Xiao, Tongran Liu, Jingbo Zhu
    [IJCAI 2020]

Survey

  • A White Paper on Neural Network Quantization
    Markus Nagel, Marios Fournarakis, Rana Ali Amjad, Yelysei Bondarenko, Mart van Baalen, Tijmen Blankevoort
    [arXiv 2106]
    [★★☆] 高通量化白皮书, 提供了PTQ和QAT的一些实用建议

  • A Survey of Quantization Methods for Efficient Neural Network Inference
    Amir Gholami, Sehoon Kim, Zhen Dong, Zhewei Yao, Michael W. Mahoney, Kurt Keutzer
    [arXiv 2103]

  • Integer Quantization for Deep Learning Inference: Principles and Empirical Evaluation
    Hao Wu, Patrick Judd, Xiaojie Zhang, Mikhail Isaev, Paulius Micikevicius
    [arXiv 2004]

  • Quantizing deep convolutional networks for efficient inference: A whitepaper
    Raghuraman Krishnamoorthi
    [arXiv 1806]
    谷歌量化白皮书

  • Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference
    Benoit Jacob, Skirmantas Kligys, Bo Chen, Menglong Zhu, Matthew Tang, Andrew Howard, Hartwig Adam, Dmitry Kalenichenko
    [CVPR 2018]
    比较详细地介绍了qat和int8推理

Resources

基于pytorch的QAT, PTQ, 剪枝等算法实现

Awesome Model Quantization

Frameworks

高通量化框架AIMET(TF1, Pytorch) [AIMET]

[Intel Distiller]

Facebook端侧推理框架 [QNNPACK] [FBGEMM] [Blog]

[NCNN]

[MNN]

[OpenVINO]