Intel® Neural Compressor (formerly known as Intel® Low Precision Optimization Tool) is an open-source Python library running on Intel CPUs and GPUs, which delivers unified interfaces across multiple deep learning frameworks for popular network compression technologies, such as quantization, pruning, knowledge distillation. This tool supports automatic accuracy-driven tuning strategies to help user quickly find out the best quantized model. It also implements different weight pruning algorithms to generate pruned model with predefined sparsity goal and supports knowledge distillation to distill the knowledge from the teacher model to the student model.
Note: GPU support is under development.
Architecture | Workflow |
---|---|
Supported deep learning frameworks are:
- TensorFlow*, including 1.15.0 UP3, 1.15.0 UP2, 1.15.0 UP1, 2.1.0, 2.2.0, 2.3.0, 2.4.0, 2.5.0, Official TensorFlow 2.6.0
Note: Intel Optimized TensorFlow 2.5.0 requires setting environment variable TF_ENABLE_MKL_NATIVE_FORMAT=0 before running quantization process or deploying the quantized model.
Note: From the official TensorFlow 2.6.0, oneDNN support has been upstreamed. Download the official TensorFlow 2.6.0 binary for the CPU device and set the environment variable TF_ENABLE_ONEDNN_OPTS=1 before running the quantization process or deploying the quantized model.
- PyTorch*, including 1.5.0+cpu, 1.6.0+cpu, 1.8.0+cpu
- Apache* MXNet, including 1.6.0, 1.7.0, 1.8.0
- ONNX* Runtime, including 1.6.0, 1.7.0, 1.8.0
- Execution Engine, a reference bare metal solution(./engine) for domain-specific NLP models.
Get started with installation, tutorials, examples, and more!
View the Intel® Neural Compressor repo at: https://github.com/intel/neural-compressor.