Calculating Final Transcript Latency with Twilio + Deepgram STT #1006
Replies: 3 comments
-
Hey there! It looks like you haven't connected your GitHub account to your Deepgram account. You can do this at https://community.deepgram.com - being verified through this process will allow our team to help you in a much more streamlined fashion. |
Beta Was this translation helpful? Give feedback.
-
It looks like we're missing some important information to help debug your issue. Would you mind providing us with the following details in a reply?
|
Beta Was this translation helpful? Give feedback.
-
In terms of measuring this latency, a latency of 200-350ms does not necessarily surprise me, but the recipe for measuring latency shouldn't be any different with a Twilio integration than with a stand-alone app. The only issue is we have no way of knowing/controlling/measuring any latencies Twilio might introduce. w.r.t. the repo, there was a breaking change in our early access voice agent API this week and these are getting updated, here's an example that was just updated: https://github.com/nikolawhallon/sts-twilio |
Beta Was this translation helpful? Give feedback.
-
Hi Deepgram Team,
We are currently using Twilio and Deepgram's STT to process audio from calls. We need to determine the final transcript latency, as our subsequent actions depend on it.
I referred to your documentation on measuring streaming latency and was able to calculate a latency of approximately 450ms using the provided approach.
However, when applying the same logic in the context of Twilio + Deepgram STT, the latency calculation becomes inconsistent. Since Twilio sends continuous audio chunks (even after the user not speaking), the results do not reflect accurate latency.
I also attempted to calculate latency using cur_min_latency (audio_cursor - transcript_cursor) from the latency_zetbdo.py file. However, the observed values range between 200ms and 350ms, which seem implausible as deepgram claimed 300ms, and I couldn't fully understand the method.
Could you guide us on how to reliably calculate the final transcript latency? Specifically, we want to measure the time from when the user stops speaking to when we receive the STT response for the complete utterance.
Any advice or clarification would be greatly appreciated.
if possible , PLEASE UPDATE YOUR LOGIC IN THE https://github.com/deepgram/deepgram-twilio-streaming-voice-agent. so that we can use it . for each conversation
Thank you bro
Beta Was this translation helpful? Give feedback.
All reactions