Efficient Weights Quantization of Convolutional Neural Networks Using Kernel Density Estimation based Non-uniform Quantizer |
2019 |
Proposed a kernel density estimation-based non-uniform quantizer that efficiently quantizes weights using a smaller subset of sampled data, comparable performance to traditional methods with reduced computational costs. |
Paper |
Flexible Quantization for Efficient Convolutional Neural Networks |
2024 |
Combines the benefits of non-uniform quantization with the implementation efficiency of uniform quantization. This method, termed non-uniform uniform quantization (NUUQ), decouples quantization levels from bit-width, allowing for flexible trade-offs between spatial and temporal complexity in hardware implementations. |
Paper |
IDKM: Memory Efficient Neural Network Quantization via Implicit, Differentiable k-Means |
2023 |
differentiable k-means algorithm (IDKM) that reduces memory complexity, enabling the quantization of large neural networks like ResNet-18 on hardware where traditional k-means methods are impractical |
Paper |
Distributional Quantization of Large Language Models |
2023 |
novel method for quantization of LLMs, which stores their parameters in 4 bits while maintaining a performance level comparable to full-precision models. The weight matrices are split into blocks, and for each block, quantization bins are computed as quantiles of a probability distribution. We analyze blocks following Gaussian, Beta, and Student’s t distributions and provide intuition about the parametric assumptions made for each of them. |
Paper |
CNQ: Compressor-Based Non-uniformQuantization of Deep Neural Networks |
2020 |
develop a novel method named Compressor-based non-uniform quantization (CNQ) method to achievenon-uniform quantization of DNNs with few unlabeledsample |
Paper |
Low-bit Quantization of Neural Networks for Efficient Inference |
2019 |
Minimum Mean Squared Error (MMSE) |
Paper |
UNIQ: UNIFORM NOISE INJECTION FOR NON-UNIFORM QUANTIZATION OF NEURAL NETWORKS |
2018 |
low computational budget regime |
Paper |
Nonuniform-to-Uniform Quantization: Towards Accurate Quantization via Generalized Straight-Through Estimation |
2022 |
Nonuniform-to-Uniform Quantization (N2UQ) |
Paper |
Efficient Weights Quantization of Convolutional Neural Networks Using Kernel Density Estimation based Non-uniform Quantizer |
2019 |
propose a kernel density estimation based non-uniform quantization methodology that can perform compression efficiently. |
Paper |
Learning both Weights and Connections for Efficient Neural Networks |
2015 |
Pruning and Quantization Techniques |
Paper |
1-bit stochastic gradient descent and its application to data-parallel distributed training of speech DNNs. |
2014 |
Gradient Quantization |
Paper |
Compressing deep convolutional networks using vector quantization |
2014 |
Non-Uniform Quantization |
Paper |
Estimating or Propagating Gradients Through Stochastic Neurons for Conditional Computation |
2013 |
Discrete Optimization |
Paper |