Skip to content

Latest commit

 

History

History
52 lines (36 loc) · 7.37 KB

README.md

File metadata and controls

52 lines (36 loc) · 7.37 KB

Survey Paper on quantization

TITLE DESCRIPTION YEAR Github LINK
Awesome-Model-Quantization This repo collects papers, documents, and codes about model quantization for anyone who wants to research it. We are continuously improving the project. Welcome to PR the works (papers, repositories) that the repo misses. 2015-2024 paper

QuantizationDNN

TITLE DESCRIPTION YEAR CODE PAPER LINK
COAT: COMPRESSING OPTIMIZER STATES AND ACTIVATION FOR MEMORY-EFFICIENT FP8 TRAINING 2024 code paper
Compressing Large Language Models using Low Rank and Low Precision Decomposition 2024 code paper
A Survey on Transformer Compression 2024 - paper
Model Quantization and Hardware Acceleration for Vision Transformers: A Comprehensive Survey 2024 code paper
A Comprehensive Survey on Model Quantization for Deep Neural Networks in Image Classification 2023 code paper
A Survey of Quantization Methods for Efficient Neural Network Inference 2021 - paper
Quantization and Deployment of Deep Neural Networks on Microcontrollers 2021 - paper
DEEP COMPRESSION: COMPRESSING DEEP NEURAL NETWORKS WITH PRUNING, TRAINED QUANTIZATION AND HUFFMAN CODING Pruning + Trained Quantization + Huffman coding 2016 paper

Other links of codes and theory

Models Description Link
K-means, Linear, Binary/Ternary quantization Detail theory and codes available code

Non-uniform quantization

TITLE YEAR DESCRIPTION LINK
Efficient Weights Quantization of Convolutional Neural Networks Using Kernel Density Estimation based Non-uniform Quantizer 2019 Proposed a kernel density estimation-based non-uniform quantizer that efficiently quantizes weights using a smaller subset of sampled data, comparable performance to traditional methods with reduced computational costs. Paper
Flexible Quantization for Efficient Convolutional Neural Networks 2024 Combines the benefits of non-uniform quantization with the implementation efficiency of uniform quantization. This method, termed non-uniform uniform quantization (NUUQ), decouples quantization levels from bit-width, allowing for flexible trade-offs between spatial and temporal complexity in hardware implementations. Paper
IDKM: Memory Efficient Neural Network Quantization via Implicit, Differentiable k-Means 2023 differentiable k-means algorithm (IDKM) that reduces memory complexity, enabling the quantization of large neural networks like ResNet-18 on hardware where traditional k-means methods are impractical Paper
Distributional Quantization of Large Language Models 2023 novel method for quantization of LLMs, which stores their parameters in 4 bits while maintaining a performance level comparable to full-precision models. The weight matrices are split into blocks, and for each block, quantization bins are computed as quantiles of a probability distribution. We analyze blocks following Gaussian, Beta, and Student’s t distributions and provide intuition about the parametric assumptions made for each of them. Paper
CNQ: Compressor-Based Non-uniformQuantization of Deep Neural Networks 2020 develop a novel method named Compressor-based non-uniform quantization (CNQ) method to achievenon-uniform quantization of DNNs with few unlabeledsample Paper
Low-bit Quantization of Neural Networks for Efficient Inference 2019 Minimum Mean Squared Error (MMSE) Paper
UNIQ: UNIFORM NOISE INJECTION FOR NON-UNIFORM QUANTIZATION OF NEURAL NETWORKS 2018 low computational budget regime Paper
Nonuniform-to-Uniform Quantization: Towards Accurate Quantization via Generalized Straight-Through Estimation 2022 Nonuniform-to-Uniform Quantization (N2UQ) Paper
Efficient Weights Quantization of Convolutional Neural Networks Using Kernel Density Estimation based Non-uniform Quantizer 2019 propose a kernel density estimation based non-uniform quantization methodology that can perform compression efficiently. Paper
Learning both Weights and Connections for Efficient Neural Networks 2015 Pruning and Quantization Techniques Paper
1-bit stochastic gradient descent and its application to data-parallel distributed training of speech DNNs. 2014 Gradient Quantization Paper
Compressing deep convolutional networks using vector quantization 2014 Non-Uniform Quantization Paper
Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation 2013 Discrete Optimization Paper