Why the model quantification acceleration is not obvious ？ #48

su26225 · 2021-06-25T10:03:03Z

Before deploying with openvino, it took about 26s for 300 images ,。

After deployment, the time was significantly increased to 12s, although not the same test code was used .

But after the quantification to int8 , the time was shortened to 9.2s, and the effect was not very obvious. What may be the cause? Is there any way to improve it? Thanks.

jayer95 · 2021-06-26T06:45:20Z

@su26225

Does your model use FP16 or FP32?
OpenVINO official said that in CPU operation, using FP16 will automatically convert to FP32, so the speed of using FP16 or FP32 in CPU operation is the same.
There are two quantized INT8 models, FP32-INT8 or FP16-INT8, both of which are faster than FP32 and FP16 in terms of CPU operations.
If you choose GPU computing, FP32-INT8 and FP16-INT8 will not increase the speed.

There are many ways to quantify models, which require experimentation. Some quantifications are full model quantification (the fastest speed and the most loss of accuracy), and some quantifications determine the degree of quantification based on the mAP loss before and after quantization. Please refer to: https://docs.openvinotoolkit.org/latest/pot_docs_BestPractices.html

su26225 · 2021-06-26T08:33:53Z

@su26225

Does your model use FP16 or FP32?
OpenVINO official said that in CPU operation, using FP16 will automatically convert to FP32, so the speed of using FP16 or FP32 in CPU operation is the same.
There are two quantized INT8 models, FP32-INT8 or FP16-INT8, both of which are faster than FP32 and FP16 in terms of CPU operations.
If you choose GPU computing, FP32-INT8 and FP16-INT8 will not increase the speed.

There are many ways to quantify models, which require experimentation.

I use a CPU, from FP32 quantization to INT8, the image detection time is shortened from 47.9ms to 39.9ms, and the time of one video frame is shortened from 108ms to 77.6ms, which is about 1/4 reduction.
At first I thought it was the reason that openvino deployment made targeted optimization for floating-point calculations, so I turned off model optimization.
https://docs.openvinotoolkit.org/latest/openvino_docs_MO_DG_prepare_model_Model_Optimization_Techniques.html
However, the result of unoptimized deployment and quantification is that the time for image detection is shortened from 56.6ms to 43.1ms, which is effective, but the effect is also not obvious.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why the model quantification acceleration is not obvious ？ #48

Why the model quantification acceleration is not obvious ？ #48

su26225 commented Jun 25, 2021

jayer95 commented Jun 26, 2021 •

edited

Loading

su26225 commented Jun 26, 2021

Why the model quantification acceleration is not obvious ？ #48

Why the model quantification acceleration is not obvious ？ #48

Comments

su26225 commented Jun 25, 2021

jayer95 commented Jun 26, 2021 • edited Loading

su26225 commented Jun 26, 2021

jayer95 commented Jun 26, 2021 •

edited

Loading