You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am currently running the speech-to-speech pipeline on an AWS EC2 instance (Ubuntu 20.04) with an Nvidia A10g GPU. The pipeline works well, but I am experiencing around 1 second of latency, and I am particularly interested in improving the latency of the entire speech-to-speech pipeline, especially the Text to Speech (TTS) part.
Current Setup:
EC2 Instance: Nvidia A10g GPU, 24GB GPU RAM
OS: Ubuntu 20.04
GPU Driver: NVIDIA-SMI 470.141.03
CUDA Version: 12.2
Pipeline: Using the standard setup from your repo
STT Model: Whisper large-v2
TTS Model: Parler-TTS (default)
Problem:
I’m currently facing around 1 second of latency for the entire pipeline from speech input to speech output. While the STT part works fairly well, the TTS step seems to contribute most to the latency. I would greatly appreciate any suggestions or guidance on reducing the overall latency, particularly for TTS.
Thanks!
The text was updated successfully, but these errors were encountered:
Take a look at this .
The proposed method increases the TTS part.
Also it is mentioned there, that 500 ms is appended after the last chunk, which means, that 500 ms is the delay until the beginning of the LLM --> TTS steps.
Thank you for sharing the link and your discussions with the author. I understand the role of Whisper Streamer in accelerating the text-to-speech process. I also recognize the 500ms latency in ParlerTTS, but I believe I am not achieving this latency. Is there any way I can optimize Parler TTS setup to reach the 500ms target?
I am trying to make this pipeline really fast. I tried integrating styleTTS ; seems like streaming is not compatible with StyleTTS as of Now, how would you approach the latency optimization?
Hi,
I am currently running the speech-to-speech pipeline on an AWS EC2 instance (Ubuntu 20.04) with an Nvidia A10g GPU. The pipeline works well, but I am experiencing around 1 second of latency, and I am particularly interested in improving the latency of the entire speech-to-speech pipeline, especially the Text to Speech (TTS) part.
Current Setup:
EC2 Instance: Nvidia A10g GPU, 24GB GPU RAM
OS: Ubuntu 20.04
GPU Driver: NVIDIA-SMI 470.141.03
CUDA Version: 12.2
Pipeline: Using the standard setup from your repo
STT Model: Whisper large-v2
TTS Model: Parler-TTS (default)
Problem:
I’m currently facing around 1 second of latency for the entire pipeline from speech input to speech output. While the STT part works fairly well, the TTS step seems to contribute most to the latency. I would greatly appreciate any suggestions or guidance on reducing the overall latency, particularly for TTS.
Thanks!
The text was updated successfully, but these errors were encountered: