Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to early stop an encoding call? #1768

Closed
mariano54 opened this issue Sep 1, 2024 · 3 comments
Closed

How to early stop an encoding call? #1768

mariano54 opened this issue Sep 1, 2024 · 3 comments

Comments

@mariano54
Copy link

mariano54 commented Sep 1, 2024

I am calling encode from whisperX/faster-whisper.

Since encode can take 200ms in my use case, and I am calling it very often for many users, I would like for the ability to do early stopping in the model forward for encode, through something like a callback.

Is this possible, and if so, could someone point me to how to do it, or what it would take to add this feature? I am happy to contribute a PR if this is something useful to others.

@mariano54
Copy link
Author

mariano54 commented Sep 2, 2024

Otherwise, would it be possible to do something like a streaming encode, where I pass in more audio data as it comes, in like 10ms chunks, to reduce the latency? Or is this not possible due to the transformer architecture?

Would it be possible to do speculative decoding with distil-whisper-v3 as the large model, and some other tiny one as the small model to reduce latency?

@minhthuc2502
Copy link
Collaborator

Hello, I don't have any suggestion for stopping early the current call. we don't have any way to access the forward of the calling encoder.

In the ctranslate2, it have not yet support the continuous batching. There is a discussion here #1333

@mariano54
Copy link
Author

Ok, thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants