Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Why the model quantification acceleration is not obvious ? #48

Open
su26225 opened this issue Jun 25, 2021 · 2 comments
Open

Why the model quantification acceleration is not obvious ? #48

su26225 opened this issue Jun 25, 2021 · 2 comments

Comments

@su26225
Copy link

su26225 commented Jun 25, 2021

Before deploying with openvino, it took about 26s for 300 images ,。

After deployment, the time was significantly increased to 12s, although not the same test code was used .

But after the quantification to int8 , the time was shortened to 9.2s, and the effect was not very obvious. What may be the cause? Is there any way to improve it? Thanks.

@jayer95
Copy link

jayer95 commented Jun 26, 2021

@su26225

Does your model use FP16 or FP32?
OpenVINO official said that in CPU operation, using FP16 will automatically convert to FP32, so the speed of using FP16 or FP32 in CPU operation is the same.
There are two quantized INT8 models, FP32-INT8 or FP16-INT8, both of which are faster than FP32 and FP16 in terms of CPU operations.
If you choose GPU computing, FP32-INT8 and FP16-INT8 will not increase the speed.

There are many ways to quantify models, which require experimentation. Some quantifications are full model quantification (the fastest speed and the most loss of accuracy), and some quantifications determine the degree of quantification based on the mAP loss before and after quantization. Please refer to: https://docs.openvinotoolkit.org/latest/pot_docs_BestPractices.html

@su26225
Copy link
Author

su26225 commented Jun 26, 2021

@su26225

Does your model use FP16 or FP32?
OpenVINO official said that in CPU operation, using FP16 will automatically convert to FP32, so the speed of using FP16 or FP32 in CPU operation is the same.
There are two quantized INT8 models, FP32-INT8 or FP16-INT8, both of which are faster than FP32 and FP16 in terms of CPU operations.
If you choose GPU computing, FP32-INT8 and FP16-INT8 will not increase the speed.

There are many ways to quantify models, which require experimentation.

I use a CPU, from FP32 quantization to INT8, the image detection time is shortened from 47.9ms to 39.9ms, and the time of one video frame is shortened from 108ms to 77.6ms, which is about 1/4 reduction.
At first I thought it was the reason that openvino deployment made targeted optimization for floating-point calculations, so I turned off model optimization.
https://docs.openvinotoolkit.org/latest/openvino_docs_MO_DG_prepare_model_Model_Optimization_Techniques.html
However, the result of unoptimized deployment and quantification is that the time for image detection is shortened from 56.6ms to 43.1ms, which is effective, but the effect is also not obvious.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants