TITLE | DESCRIPTION | YEAR | Github LINK |
---|---|---|---|
Awesome-Model-Quantization | This repo collects papers, documents, and codes about model quantization for anyone who wants to research it. We are continuously improving the project. Welcome to PR the works (papers, repositories) that the repo misses. | 2015-2024 | paper |
TITLE | DESCRIPTION | YEAR | CODE | PAPER LINK |
---|---|---|---|---|
COAT: COMPRESSING OPTIMIZER STATES AND ACTIVATION FOR MEMORY-EFFICIENT FP8 TRAINING | 2024 | code | paper | |
Compressing Large Language Models using Low Rank and Low Precision Decomposition | 2024 | code | paper | |
A Survey on Transformer Compression | 2024 | - | paper | |
Model Quantization and Hardware Acceleration for Vision Transformers: A Comprehensive Survey | 2024 | code | paper | |
A Comprehensive Survey on Model Quantization for Deep Neural Networks in Image Classification | 2023 | code | paper | |
A Survey of Quantization Methods for Efficient Neural Network Inference | 2021 | - | paper | |
Quantization and Deployment of Deep Neural Networks on Microcontrollers | 2021 | - | paper | |
DEEP COMPRESSION: COMPRESSING DEEP NEURAL NETWORKS WITH PRUNING, TRAINED QUANTIZATION AND HUFFMAN CODING | Pruning + Trained Quantization + Huffman coding | 2016 | paper | |
Models | Description | Link |
---|---|---|
K-means, Linear, Binary/Ternary quantization | Detail theory and codes available | code |
TITLE | YEAR | DESCRIPTION | LINK |
---|---|---|---|
Efficient Weights Quantization of Convolutional Neural Networks Using Kernel Density Estimation based Non-uniform Quantizer | 2019 | Proposed a kernel density estimation-based non-uniform quantizer that efficiently quantizes weights using a smaller subset of sampled data, comparable performance to traditional methods with reduced computational costs. | Paper |
Flexible Quantization for Efficient Convolutional Neural Networks | 2024 | Combines the benefits of non-uniform quantization with the implementation efficiency of uniform quantization. This method, termed non-uniform uniform quantization (NUUQ), decouples quantization levels from bit-width, allowing for flexible trade-offs between spatial and temporal complexity in hardware implementations. | Paper |
IDKM: Memory Efficient Neural Network Quantization via Implicit, Differentiable k-Means | 2023 | differentiable k-means algorithm (IDKM) that reduces memory complexity, enabling the quantization of large neural networks like ResNet-18 on hardware where traditional k-means methods are impractical | Paper |
Distributional Quantization of Large Language Models | 2023 | novel method for quantization of LLMs, which stores their parameters in 4 bits while maintaining a performance level comparable to full-precision models. The weight matrices are split into blocks, and for each block, quantization bins are computed as quantiles of a probability distribution. We analyze blocks following Gaussian, Beta, and Student’s t distributions and provide intuition about the parametric assumptions made for each of them. | Paper |
CNQ: Compressor-Based Non-uniformQuantization of Deep Neural Networks | 2020 | develop a novel method named Compressor-based non-uniform quantization (CNQ) method to achievenon-uniform quantization of DNNs with few unlabeledsample | Paper |
Low-bit Quantization of Neural Networks for Efficient Inference | 2019 | Minimum Mean Squared Error (MMSE) | Paper |
UNIQ: UNIFORM NOISE INJECTION FOR NON-UNIFORM QUANTIZATION OF NEURAL NETWORKS | 2018 | low computational budget regime | Paper |
Nonuniform-to-Uniform Quantization: Towards Accurate Quantization via Generalized Straight-Through Estimation | 2022 | Nonuniform-to-Uniform Quantization (N2UQ) | Paper |
Efficient Weights Quantization of Convolutional Neural Networks Using Kernel Density Estimation based Non-uniform Quantizer | 2019 | propose a kernel density estimation based non-uniform quantization methodology that can perform compression efficiently. | Paper |
Learning both Weights and Connections for Efficient Neural Networks | 2015 | Pruning and Quantization Techniques | Paper |
1-bit stochastic gradient descent and its application to data-parallel distributed training of speech DNNs. | 2014 | Gradient Quantization | Paper |
Compressing deep convolutional networks using vector quantization | 2014 | Non-Uniform Quantization | Paper |
Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation | 2013 | Discrete Optimization | Paper |