This tutorial demonstrates step-by-step how to perform model quantization using the OpenVINO Post-Training Optimization Tool (POT), compare model accuracy between the FP32 precision and quantized INT8 precision models and run a demo of model inference based on sample code from Ultralytics Yolov5 with the OpenVINO backend.
The notebook uses Ultralytics Yolov5 to obtain the YOLOv5-m model in OpenVINO Intermediate Representation (IR) format. Then, the OpenVINO Post-Training Optimization Tool (POT) API is used to quantize the model based on Non-Max Suppression (NMS) processing provided by Ultralytics. To ensure minimal accuracy loss, the accuracy is compared between the FP32 model and the INT8 model quantized by POT using "DefaultQuantization" algorithm. Finally, the code sample detect.py from Ultralytics is used to perform inference the INT8 model and check performance using OpenVINO with sync API enabled.
If you have not done so already, please follow the Installation Guide to install all required dependencies.