-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] Streaming dosn't work | inference_stream is as fast as normal inference #97
Comments
You didn't share any timings, so it's hard to say what's going on. Note that streaming inference may take longer in total than normal inference, just the first chunks should arrive faster. Could you share (once models are loaded/warmed up):
|
Sorry, I had forgotten, the times after the model is loaded/warmed up are:
|
I have done this implementation, which might be useful for you (with additional threading to reduce latency): https://github.com/yalsaffar/S3TVR/blob/main/models/TTS_utils.py#L232 |
Describe the bug
When running tests against the TTS endpoint, I've observed that streaming the audio response takes nearly the same amount of time as receiving a fully generated audio file. This seems counterintuitive, as streaming should typically deliver the response faster, starting with the first available data chunk. Below are the code for the streaming endpoint
To Reproduce
model_manager.py
tts_streaming.py
main.py
Expected behavior
The behavior I'm expecting is that I get the tts stream back much sooner than if I request the finished file.
Logs
No response
Environment
Additional context
Thanks for the help here in advance
The text was updated successfully, but these errors were encountered: