Add support for quantization and custom audio context size to OpenVino #2184

dscripka · 2024-05-25T15:00:42Z

Compiling with OpenVino on supported platforms can significantly improve performance, but currently it lacks two other features that can also significantly improve performance:

Using quantized models (10%+ performance increase)
Supporting custom audio context sizes (~3x performance increase for short audio files)

This PR adds both of these as optional arguments to the OpenVino model conversion scripts, enabling performance of quantization, audio context size, and OpenVino to stack.

There are a few important caveats, however:

The OpenVino encoder (to my knowledge) only supports a fixed audio context size, so the converted model is somewhat more restricted
This does require monkey patching a method from the openai-whisper library in convert-whisper-to-openvino.py, which isn't ideal
Quantization is done with nncf 2.7.0, which currently only supports 4 and 8 bit quantization

Despite these, the performance improvement can be so substantial for certain use cases it may be worth it. For example, on the ~10 second jfk.wav file on a Intel(R) Xeon(R) W-2123 CPU:

Threads	Quant	Model	Encoder Time (s)	Arguments	Build
1	8 bit	small-en	8	-bs 1 -ac 1500	BLAS = 1
1	8 bit	small-en	0.8	-bs 1 -ac 550	OPENVINO = 1

Command to produce the OpenVino model for this PR: python convert-whisper-to-openvino.py --model small.en -ac 550 -qb 8

…t sizes as well as 4/8 bit quantization with nncf

jason-ni · 2024-08-22T03:22:44Z

Is there a way to improve openvino encoder to support dynamic audio_ctx? That would be much more usefull.

dscripka · 2024-08-24T21:14:10Z

I agree that would be ideal, but it would require more substantial modifications to how the encoder is converted and the actual OpenVino runtime implementation. Possible perhaps, but difficult.

jason-ni · 2024-08-27T03:16:38Z

But CPU and other GPU backends all support dynamic audio_ctx given fixed model encoding size. If we allow converting OpenVino model with different encoding size, it would make model users confused.

dscripka added 2 commits May 24, 2024 08:19

updated openvino converstion script to support different audio contex…

5bcdbf2

…t sizes as well as 4/8 bit quantization with nncf

updated nncf version to match openvino install

0dcfaed

dscripka changed the title ~~Openvino audio ctx quantize~~ Add support for quantization and custom audio context size to OpenVino May 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for quantization and custom audio context size to OpenVino #2184

Add support for quantization and custom audio context size to OpenVino #2184

dscripka commented May 25, 2024

jason-ni commented Aug 22, 2024

dscripka commented Aug 24, 2024

jason-ni commented Aug 27, 2024

Add support for quantization and custom audio context size to OpenVino #2184

Are you sure you want to change the base?

Add support for quantization and custom audio context size to OpenVino #2184

Conversation

dscripka commented May 25, 2024

jason-ni commented Aug 22, 2024

dscripka commented Aug 24, 2024

jason-ni commented Aug 27, 2024