-
Notifications
You must be signed in to change notification settings - Fork 289
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to early stop an encoding call? #1768
Comments
Otherwise, would it be possible to do something like a streaming encode, where I pass in more audio data as it comes, in like 10ms chunks, to reduce the latency? Or is this not possible due to the transformer architecture? Would it be possible to do speculative decoding with distil-whisper-v3 as the large model, and some other tiny one as the small model to reduce latency? |
Hello, I don't have any suggestion for stopping early the current call. we don't have any way to access the forward of the calling encoder. In the ctranslate2, it have not yet support the continuous batching. There is a discussion here #1333 |
Ok, thank you! |
I am calling encode from whisperX/faster-whisper.
Since encode can take 200ms in my use case, and I am calling it very often for many users, I would like for the ability to do early stopping in the model forward for encode, through something like a callback.
Is this possible, and if so, could someone point me to how to do it, or what it would take to add this feature? I am happy to contribute a PR if this is something useful to others.
The text was updated successfully, but these errors were encountered: