Latency Optimization for Speech-to-Speech Pipeline #107

yatharthk2 · 2024-09-18T18:09:45Z

Hi,

I am currently running the speech-to-speech pipeline on an AWS EC2 instance (Ubuntu 20.04) with an Nvidia A10g GPU. The pipeline works well, but I am experiencing around 1 second of latency, and I am particularly interested in improving the latency of the entire speech-to-speech pipeline, especially the Text to Speech (TTS) part.

Current Setup:
EC2 Instance: Nvidia A10g GPU, 24GB GPU RAM
OS: Ubuntu 20.04
GPU Driver: NVIDIA-SMI 470.141.03
CUDA Version: 12.2
Pipeline: Using the standard setup from your repo
STT Model: Whisper large-v2
TTS Model: Parler-TTS (default)

Problem:
I’m currently facing around 1 second of latency for the entire pipeline from speech input to speech output. While the STT part works fairly well, the TTS step seems to contribute most to the latency. I would greatly appreciate any suggestions or guidance on reducing the overall latency, particularly for TTS.

Thanks!

sandorkonya · 2024-09-18T18:59:58Z

Take a look at this .
The proposed method increases the TTS part.
Also it is mentioned there, that 500 ms is appended after the last chunk, which means, that 500 ms is the delay until the beginning of the LLM --> TTS steps.

yatharthk2 · 2024-09-21T01:19:29Z

Thank you for sharing the link and your discussions with the author. I understand the role of Whisper Streamer in accelerating the text-to-speech process. I also recognize the 500ms latency in ParlerTTS, but I believe I am not achieving this latency. Is there any way I can optimize Parler TTS setup to reach the 500ms target?

yatharthk2 · 2024-09-21T01:22:15Z

I am trying to make this pipeline really fast. I tried integrating styleTTS ; seems like streaming is not compatible with StyleTTS as of Now, how would you approach the latency optimization?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Latency Optimization for Speech-to-Speech Pipeline #107

Latency Optimization for Speech-to-Speech Pipeline #107

yatharthk2 commented Sep 18, 2024

sandorkonya commented Sep 18, 2024

yatharthk2 commented Sep 21, 2024

yatharthk2 commented Sep 21, 2024

Latency Optimization for Speech-to-Speech Pipeline #107

Latency Optimization for Speech-to-Speech Pipeline #107

Comments

yatharthk2 commented Sep 18, 2024

sandorkonya commented Sep 18, 2024

yatharthk2 commented Sep 21, 2024

yatharthk2 commented Sep 21, 2024