Survey Paper on quantization

TITLE	DESCRIPTION	YEAR	Github LINK
Awesome-Model-Quantization	This repo collects papers, documents, and codes about model quantization for anyone who wants to research it. We are continuously improving the project. Welcome to PR the works (papers, repositories) that the repo misses.	2015-2024	paper

QuantizationDNN

TITLE	DESCRIPTION	YEAR	CODE	PAPER LINK
COAT: COMPRESSING OPTIMIZER STATES AND ACTIVATION FOR MEMORY-EFFICIENT FP8 TRAINING		2024	code	paper
Compressing Large Language Models using Low Rank and Low Precision Decomposition		2024	code	paper
A Survey on Transformer Compression		2024	-	paper
Model Quantization and Hardware Acceleration for Vision Transformers: A Comprehensive Survey		2024	code	paper
A Comprehensive Survey on Model Quantization for Deep Neural Networks in Image Classification		2023	code	paper
A Survey of Quantization Methods for Efficient Neural Network Inference		2021	-	paper
Quantization and Deployment of Deep Neural Networks on Microcontrollers		2021	-	paper
DEEP COMPRESSION: COMPRESSING DEEP NEURAL NETWORKS WITH PRUNING, TRAINED QUANTIZATION AND HUFFMAN CODING	Pruning + Trained Quantization + Huffman coding	2016		paper

Non-uniform quantization

TITLE	YEAR	DESCRIPTION	LINK
Efficient Weights Quantization of Convolutional Neural Networks Using Kernel Density Estimation based Non-uniform Quantizer	2019	Proposed a kernel density estimation-based non-uniform quantizer that efficiently quantizes weights using a smaller subset of sampled data, comparable performance to traditional methods with reduced computational costs.	Paper
Flexible Quantization for Efficient Convolutional Neural Networks	2024	Combines the benefits of non-uniform quantization with the implementation efficiency of uniform quantization. This method, termed non-uniform uniform quantization (NUUQ), decouples quantization levels from bit-width, allowing for flexible trade-offs between spatial and temporal complexity in hardware implementations.	Paper
IDKM: Memory Efficient Neural Network Quantization via Implicit, Differentiable k-Means	2023	differentiable k-means algorithm (IDKM) that reduces memory complexity, enabling the quantization of large neural networks like ResNet-18 on hardware where traditional k-means methods are impractical	Paper
Distributional Quantization of Large Language Models	2023	novel method for quantization of LLMs, which stores their parameters in 4 bits while maintaining a performance level comparable to full-precision models. The weight matrices are split into blocks, and for each block, quantization bins are computed as quantiles of a probability distribution. We analyze blocks following Gaussian, Beta, and Student’s t distributions and provide intuition about the parametric assumptions made for each of them.	Paper
CNQ: Compressor-Based Non-uniformQuantization of Deep Neural Networks	2020	develop a novel method named Compressor-based non-uniform quantization (CNQ) method to achievenon-uniform quantization of DNNs with few unlabeledsample	Paper
Low-bit Quantization of Neural Networks for Efficient Inference	2019	Minimum Mean Squared Error (MMSE)	Paper
UNIQ: UNIFORM NOISE INJECTION FOR NON-UNIFORM QUANTIZATION OF NEURAL NETWORKS	2018	low computational budget regime	Paper
Nonuniform-to-Uniform Quantization: Towards Accurate Quantization via Generalized Straight-Through Estimation	2022	Nonuniform-to-Uniform Quantization (N2UQ)	Paper
Efficient Weights Quantization of Convolutional Neural Networks Using Kernel Density Estimation based Non-uniform Quantizer	2019	propose a kernel density estimation based non-uniform quantization methodology that can perform compression efficiently.	Paper
Learning both Weights and Connections for Efficient Neural Networks	2015	Pruning and Quantization Techniques	Paper
1-bit stochastic gradient descent and its application to data-parallel distributed training of speech DNNs.	2014	Gradient Quantization	Paper
Compressing deep convolutional networks using vector quantization	2014	Non-Uniform Quantization	Paper
Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation	2013	Discrete Optimization	Paper

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Survey Paper on quantization

QuantizationDNN

Other links of codes and theory

Non-uniform quantization

Files

README.md

Latest commit

History

README.md

File metadata and controls

Survey Paper on quantization

QuantizationDNN

Other links of codes and theory

Non-uniform quantization