Quantization, integer inference, and other stuff related to mobile deployment
-
Solving Oscillation Problem in Post-Training Quantization Through a Theoretical Perspective
Yuexiao Ma, Huixia Li, Xiawu Zheng, Xuefeng Xiao, Rui Wang, Shilei Wen, Xin Pan, Fei Chao, Rongrong Ji
[CVPR 2023]
[MRECG] -
Data Free Quantization Through Weight Equalization and Bias Correction
Markus Nagel, Mart van Baalen, Tijmen Blankevoort, Max Welling
[ICCV 2019] [Project] [Pytorch-Code]
[CLE/BC] [★★]
-
Distance-aware Quantization
Junghyup Lee, Dohyung Kim, Bumsub Ham
[ICCV 2021]
[DAQ] -
Fully Quantized Image Super-Resolution Networks
Hu Wang, Peng Chen, Bohan Zhuang, Chunhua Shen
[MM 2021] [Pytorch-Code]
[FQSR] [★☆] -
Gradient ℓ1 Regularization
Milad Alizadeh, Arash Behboodi, Mart van Baalen, Christos Louizos, Tijmen Blankevoort, Max Welling
[ICLR 2020]
[★☆] 高通出品. 提出加入梯度的L1正则, 减小量化误差. -
PAMS: Quantized Super-Resolution via Parameterized Max Scale
Huixia Li, Chenqian Yan, Shaohui Lin, Xiawu Zheng, Yuchao Li, Baochang Zhang, Fan Yang, Rongrong Ji
[arXiv 1902]
[★] -
LSQ+: Improving low-bit quantization through learnable offsets and better initialization
Yash Bhalgat, Jinwon Lee, Markus Nagel, Tijmen Blankevoort, Nojun Kwak
[CVPR 2020] [Unofficial-Pytorch-Code]
[LSQ+] [★★] 将LSQ推广至非对称量化, scale和offset均可学习 -
Learned Step Size Quantization
Steven K. Esser, Jeffrey L. McKinstry, Deepika Bablani, Rathinakumar Appuswamy, Dharmendra S. Modha
[arXiv 1902] [Unofficial-Pytorch-Code]
[LSQ] [★★] 学习scale -
ProxQuant: Quantized Neural Networks via Proximal Operators
Yu Bai, Yu-Xiang Wang, Edo Liberty
[ICLR 2019] [Pytorch-Code] -
PACT: Parameterized Clipping Activation for Quantized Neural Networks
Jungwook Choi, Zhuo Wang, Swagath Venkataramani, Pierce I-Jen Chuang, Vijayalakshmi Srinivasan, Kailash Gopalakrishnan
[ICLR 2018] [Pytorch-Code]
[★☆] Intel出品. 提出对ReLU的上限加一个可学习的截断, 使qat时网络能自动找到更好的clip range. -
On periodic functions as regularizers for quantization of neural networks
Maxim Naumov, Utku Diril, Jongsoo Park, Benjamin Ray, Jedrzej Jablonski, Andrew Tulloch
[arXiv 1811]
[★☆] Facebook出品 -
Learning Sparse Low-Precision Neural Networks With Learnable Regularization
Yoojin Choi, Mostafa El-Khamy, Jungwon Lee
[arXiv 1809]
三星出品 -
LQ-Nets: Learned Quantization for Highly Accurate and Compact Deep Neural Networks
Dongqing Zhang, Jiaolong Yang, Dongqiangzi Ye, Gang Hua
[ECCV 2018] [TF-Code]
MSRA出品 -
Towards Effective Low-bitwidth Convolutional Neural Networks
Bohan Zhuang, Chunhua Shen, Mingkui Tan, Lingqiao Liu, Ian Reid
[CVPR 2018] [Pytorch-Code]
[★] 提出3个trick: 1) 分两阶段, 先量化weight, 再量化act; 2) 逐渐降低比特数, 对2bit量化等情况可能有效; 3) 用float模型对quant模型做feature的蒸馏
- Towards Fully 8-bit Integer Inference for the Transformer Model
Ye Lin, Yanyang Li, Tengbo Liu, Tong Xiao, Tongran Liu, Jingbo Zhu
[IJCAI 2020]
-
A White Paper on Neural Network Quantization
Markus Nagel, Marios Fournarakis, Rana Ali Amjad, Yelysei Bondarenko, Mart van Baalen, Tijmen Blankevoort
[arXiv 2106]
[★★☆] 高通量化白皮书, 提供了PTQ和QAT的一些实用建议 -
A Survey of Quantization Methods for Efficient Neural Network Inference
Amir Gholami, Sehoon Kim, Zhen Dong, Zhewei Yao, Michael W. Mahoney, Kurt Keutzer
[arXiv 2103] -
Integer Quantization for Deep Learning Inference: Principles and Empirical Evaluation
Hao Wu, Patrick Judd, Xiaojie Zhang, Mikhail Isaev, Paulius Micikevicius
[arXiv 2004] -
Quantizing deep convolutional networks for efficient inference: A whitepaper
Raghuraman Krishnamoorthi
[arXiv 1806]
谷歌量化白皮书 -
Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference
Benoit Jacob, Skirmantas Kligys, Bo Chen, Menglong Zhu, Matthew Tang, Andrew Howard, Hartwig Adam, Dmitry Kalenichenko
[CVPR 2018]
比较详细地介绍了qat和int8推理
高通量化框架AIMET(TF1, Pytorch) [AIMET]
Facebook端侧推理框架 [QNNPACK] [FBGEMM] [Blog]
[NCNN]
[MNN]
[OpenVINO]