How to batching process properly #426

Simon-chai · 2024-03-04T10:41:24Z

          # New V4 VAD Released

Changes:

Improved quality
Improved perfomance
Both 8k and 16k sampling rates are now supported by the ONNX model
Batching is now supported by the ONNX model
Added audio_forward method for one-line processing of a single or multiple audio without postprocessing

Originally posted by @adamnsandle in #2 (comment)
I see here said Batching is now supported by the ONNX model，but I can't find any doc or example about how to batch. Can anyone point it out for me?And I see function get_speech_timestamps only accept one dim audio,does it mean there is no way surport batching in this function?

The text was updated successfully, but these errors were encountered:

snakers4 · 2024-03-05T04:08:35Z

Hi,

Batching is complicated and error-prone, and we dicourage users against using it.

Simon-chai · 2024-03-05T11:17:53Z

Hi,

Batching is complicated and error-prone, and we dicourage users against using it.

Thank you for answering!
One last question,which part is error-prone when doing batching VAD? The result model return or the custom code deal with the result?If it's the later one, I think error is avoidable,right?
Beacause today i figure out how to input batching parameter,I think if the batching result is solid,it worth a try.
Looking forward the answer

snakers4 · 2024-03-05T13:28:49Z

The result model return or the custom code deal with the result?

The key problem is that the VAD is not stateless, i.e. it holds a state at all times.
When you use a batch, it has some sequential internal state (or memory) for each batch index.

If a service processes random audios at random times it may become complicated to keep track of this.
If you look at the ONNX wrapper you will see how state can be cached externally.

The optimal architecture may differ for each particular case. For example, if the state is handled externally, you can return it each time and pass it back on a new invocation and process "batches".

The problem arises because most publisher-consumer messenging systems do not have support of batches (apart from celery, maybe). The architecture can be handled in a bunch of different ways in Python, i.e. using built-in abstractions like ProcessPoolExecutor or ThreadPoolExecutor, or via messenging system with Remote-Procedure-Call pattern.

If you will be using it in C++, Java or something else, I would suggest using ONNX runtime and taking a look at how our ONNX wrapper handles the state.

Simon-chai · 2024-03-06T16:25:39Z

If a service processes random audios at random times it may become complicated to keep track of this.
If you look at the ONNX wrapper you will see how state can be cached externally.

Thank you for the helpful answer!
My scenario is to process multi-channel audio files，one file at a time，using one process in python with one model.You can consider it as serial processing. And there will be fixed channel number each time. In my case，I think I don't have to concern the state problem as long as I remember reset state before processing next file, am I right?
Based on my simple tests,batching result is exactly the same as doing single process multiple times,I believe we can say that the batching result is solid. I also believe with appropriate adaptation of function get_speech_timestamps it will handle the batching result right. BTW,the model output is not perfect each chunk, but the function get_speech_timestamps can make the final result almost perfect,very impressive. Although I don't fully understand the function, it won't prevent me apllying it to batching result. I will try it tomorrow and observe how much performance improvement can be achieved through batch processing，because performance improvement is all about.
If I understand any part of it wrong,please point it out. Before that I am going to implement my plan .
Thank you so much！

snakers4 · 2024-03-06T16:52:33Z

Although I don't fully understand the function, it won't prevent me apllying it to batching result

My advice is to process each channel separately, extract model outputs, then run the function separately for each channel, and then apply some simple voting mechanism.

You see, the heuristics in the post-processing function are very non-batch friendly.

Simon-chai · 2024-03-07T01:41:57Z

Although I don't fully understand the function, it won't prevent me apllying it to batching result

My advice is to process each channel separately, extract model outputs, then run the function separately for each channel, and then apply some simple voting mechanism.

You see, the heuristics in the post-processing function are very non-batch friendly.

What about processing all channels in batch then apply voting mechanism to the result separately?
I try a 20s duration audio with 2 channels,it take about 0.28s to process when doing it separately,but only 0.16s when batching (Intel Xeon w-2245 cpu).The improvement is huge enough. The only thing I worry about is the accuracy, but if the batching result is solid,there must a way to deal with it right. In other words,the post-processing function don't have to batch-friendly,do you agree?

snakers4 closed this as completed Mar 5, 2024

Simon-chai changed the title ~~# New V4 VAD Released~~ How to batching process properly Mar 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to batching process properly #426

How to batching process properly #426

Simon-chai commented Mar 4, 2024

snakers4 commented Mar 5, 2024

Simon-chai commented Mar 5, 2024 •

edited

Loading

snakers4 commented Mar 5, 2024

Simon-chai commented Mar 6, 2024

snakers4 commented Mar 6, 2024

Simon-chai commented Mar 7, 2024

How to batching process properly #426

How to batching process properly #426

Comments

Simon-chai commented Mar 4, 2024

snakers4 commented Mar 5, 2024

Simon-chai commented Mar 5, 2024 • edited Loading

snakers4 commented Mar 5, 2024

Simon-chai commented Mar 6, 2024

snakers4 commented Mar 6, 2024

Simon-chai commented Mar 7, 2024

Simon-chai commented Mar 5, 2024 •

edited

Loading