-
Notifications
You must be signed in to change notification settings - Fork 463
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to batching process properly #426
Comments
Hi, Batching is complicated and error-prone, and we dicourage users against using it. |
Thank you for answering! |
The key problem is that the VAD is not stateless, i.e. it holds a state at all times. If a service processes random audios at random times it may become complicated to keep track of this. The optimal architecture may differ for each particular case. For example, if the state is handled externally, you can return it each time and pass it back on a new invocation and process "batches". The problem arises because most publisher-consumer messenging systems do not have support of batches (apart from celery, maybe). The architecture can be handled in a bunch of different ways in Python, i.e. using built-in abstractions like ProcessPoolExecutor or ThreadPoolExecutor, or via messenging system with Remote-Procedure-Call pattern. If you will be using it in C++, Java or something else, I would suggest using ONNX runtime and taking a look at how our ONNX wrapper handles the state. |
Thank you for the helpful answer! |
My advice is to process each channel separately, extract model outputs, then run the function separately for each channel, and then apply some simple voting mechanism. You see, the heuristics in the post-processing function are very non-batch friendly. |
What about processing all channels in batch then apply voting mechanism to the result separately? |
Changes:
audio_forward
method for one-line processing of a single or multiple audio without postprocessingOriginally posted by @adamnsandle in #2 (comment)
I see here said Batching is now supported by the ONNX model,but I can't find any doc or example about how to batch. Can anyone point it out for me?And I see function get_speech_timestamps only accept one dim audio,does it mean there is no way surport batching in this function?
The text was updated successfully, but these errors were encountered: