Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to batching process properly #426

Closed
Simon-chai opened this issue Mar 4, 2024 · 6 comments
Closed

How to batching process properly #426

Simon-chai opened this issue Mar 4, 2024 · 6 comments

Comments

@Simon-chai
Copy link

          # New V4 VAD Released 

Changes:

  • Improved quality
  • Improved perfomance
  • Both 8k and 16k sampling rates are now supported by the ONNX model
  • Batching is now supported by the ONNX model
  • Added audio_forward method for one-line processing of a single or multiple audio without postprocessing

Originally posted by @adamnsandle in #2 (comment)
I see here said Batching is now supported by the ONNX model,but I can't find any doc or example about how to batch. Can anyone point it out for me?And I see function get_speech_timestamps only accept one dim audio,does it mean there is no way surport batching in this function?

@snakers4
Copy link
Owner

snakers4 commented Mar 5, 2024

Hi,

Batching is complicated and error-prone, and we dicourage users against using it.

@snakers4 snakers4 closed this as completed Mar 5, 2024
@Simon-chai
Copy link
Author

Simon-chai commented Mar 5, 2024

Hi,

Batching is complicated and error-prone, and we dicourage users against using it.

Thank you for answering!
One last question,which part is error-prone when doing batching VAD? The result model return or the custom code deal with the result?If it's the later one, I think error is avoidable,right?
Beacause today i figure out how to input batching parameter,I think if the batching result is solid,it worth a try.
Looking forward the answer

@snakers4
Copy link
Owner

snakers4 commented Mar 5, 2024

The result model return or the custom code deal with the result?

The key problem is that the VAD is not stateless, i.e. it holds a state at all times.
When you use a batch, it has some sequential internal state (or memory) for each batch index.

If a service processes random audios at random times it may become complicated to keep track of this.
If you look at the ONNX wrapper you will see how state can be cached externally.

The optimal architecture may differ for each particular case. For example, if the state is handled externally, you can return it each time and pass it back on a new invocation and process "batches".

The problem arises because most publisher-consumer messenging systems do not have support of batches (apart from celery, maybe). The architecture can be handled in a bunch of different ways in Python, i.e. using built-in abstractions like ProcessPoolExecutor or ThreadPoolExecutor, or via messenging system with Remote-Procedure-Call pattern.

If you will be using it in C++, Java or something else, I would suggest using ONNX runtime and taking a look at how our ONNX wrapper handles the state.

@Simon-chai
Copy link
Author

If a service processes random audios at random times it may become complicated to keep track of this.
If you look at the ONNX wrapper you will see how state can be cached externally.

Thank you for the helpful answer!
My scenario is to process multi-channel audio files,one file at a time,using one process in python with one model.You can consider it as serial processing. And there will be fixed channel number each time. In my case,I think I don't have to concern the state problem as long as I remember reset state before processing next file, am I right?
Based on my simple tests,batching result is exactly the same as doing single process multiple times,I believe we can say that the batching result is solid. I also believe with appropriate adaptation of function get_speech_timestamps it will handle the batching result right. BTW,the model output is not perfect each chunk, but the function get_speech_timestamps can make the final result almost perfect,very impressive. Although I don't fully understand the function, it won't prevent me apllying it to batching result. I will try it tomorrow and observe how much performance improvement can be achieved through batch processing,because performance improvement is all about.
If I understand any part of it wrong,please point it out. Before that I am going to implement my plan .
Thank you so much!

@snakers4
Copy link
Owner

snakers4 commented Mar 6, 2024

Although I don't fully understand the function, it won't prevent me apllying it to batching result

My advice is to process each channel separately, extract model outputs, then run the function separately for each channel, and then apply some simple voting mechanism.

You see, the heuristics in the post-processing function are very non-batch friendly.

@Simon-chai
Copy link
Author

Although I don't fully understand the function, it won't prevent me apllying it to batching result

My advice is to process each channel separately, extract model outputs, then run the function separately for each channel, and then apply some simple voting mechanism.

You see, the heuristics in the post-processing function are very non-batch friendly.

What about processing all channels in batch then apply voting mechanism to the result separately?
I try a 20s duration audio with 2 channels,it take about 0.28s to process when doing it separately,but only 0.16s when batching (Intel Xeon w-2245 cpu).The improvement is huge enough. The only thing I worry about is the accuracy, but if the batching result is solid,there must a way to deal with it right. In other words,the post-processing function don't have to batch-friendly,do you agree?

@Simon-chai Simon-chai changed the title # New V4 VAD Released How to batching process properly Mar 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants