Timestamps returned with streaming transcription are wrong. #777

venusatuluri · 2024-05-22T17:52:54Z

venusatuluri
May 22, 2024

Which Deepgram product feature are you using?

Deepgram API - STT Streaming

Details

The start and end timestamp on a subset of transcription results are incorrect. We have noticed this across multiple inputs, I am attaching a simple example here. Please find the code and the input file where the issue happens in the attached zip. I am also attaching the request-id where the issue happened.

This is the specific result with the incorrect timestamp. The speech for "Alright" does not start at 21.485, it is almost finished by that point.

{
    "type": "Results",
    "channel_index": [
        0,
        1
    ],
    "duration": 0.9699993,
    "start": 21.0, # This timestamp is usually too early (but in this case it is correct), so we cannot use this. 
    "is_final": true,
    "speech_final": true,
    "channel": {
        "alternatives": [
            {
                "transcript": "Alright.",
                "confidence": 0.9297402,
                "words": [
                    {
                        "word": "alright",
                        "start": 21.485, # This timestamp is usually correct, but in this case it is not. 
                        "end": 21.97,
                        "confidence": 0.9297402,
                        "punctuated_word": "Alright."
                    }
                ]
            }
        ]
    },
    "metadata": {
        "request_id": "ecd69ba3-dd6e-4dc0-9abf-91d649530d5c",
        "model_info": {
            "name": "2-phonecall-nova",
            "version": "2024-02-05.31606",
            "arch": "nova-2"
        },
        "model_uuid": "9c7ae805-e600-4e0f-a6a2-725be88b7ede"
    }
}

Note that we have found that the outer "start" timestamp (in this case set to 21.0) is frequently very early, so we believe it is better to treat the start timestamp of the first word in the "words" array as the true start timestamp. In the same call, the previous result looks like this. In this case, you can see that the outer start timestamp is 17.23, which is many seconds before the 1st word's start timestamp, which is 20.029. The latter is correct, and we have generally found this to be the case. However, as pointed out above, the start timestamp of the 1st word is also occasionally wrong, leaving us unable to figure out what the true timestamp of the speech is.

{
    "type": "Results",
    "channel_index": [
        0,
        1
    ],
    "duration": 3.7700005,
    "start": 17.23, # This timestamp is too early, the below timestamp is the right one. 
    "is_final": true,
    "speech_final": true,
    "channel": {
        "alternatives": [
            {
                "transcript": "I got it.",
                "confidence": 0.9349395,
                "words": [
                    {
                        "word": "i",
                        "start": 20.029999, # This timestamp is correct.
                        "end": 20.189999,
                        "confidence": 0.61845225,
                        "punctuated_word": "I"
                    },
                    {
                        "word": "got",
                        "start": 20.189999,
                        "end": 20.59,
                        "confidence": 0.9349395,
                        "punctuated_word": "got"
                    },
                    {
                        "word": "it",
                        "start": 20.59,
                        "end": 21.0,
                        "confidence": 0.99053323,
                        "punctuated_word": "it."
                    }
                ]
            }
        ]
    },
    "metadata": {
        "request_id": "ecd69ba3-dd6e-4dc0-9abf-91d649530d5c",
        "model_info": {
            "name": "2-phonecall-nova",
            "version": "2024-02-05.31606",
            "arch": "nova-2"
        },
        "model_uuid": "9c7ae805-e600-4e0f-a6a2-725be88b7ede"
    }
}

If you are making a request to the Deepgram API, what is the full Deepgram URL you are making a request to?

No response

If you are making a request to the Deepgram API and have a request ID, please paste it below:

ecd69ba3-dd6e-4dc0-9abf-91d649530d5c

If possible, please attach your code or paste it into the text box.

import sys
import time
from deepgram import DeepgramClient
from deepgram.client import (
LiveTranscriptionEvents,
LiveOptions,
LiveResultResponse,
)
import os
import numpy as np
import soundfile as sf

def callback(_, result: LiveResultResponse, **kwargs):
print(result)

if name == "main":
deepgram = DeepgramClient(api_key=os.environ["DEEPGRAM_API_KEY"])
deepgram_conn = deepgram.listen.live.v("1")
deepgram_conn.on(LiveTranscriptionEvents.Transcript, callback)
options = LiveOptions(
model="nova-2-phonecall",
version="latest",
punctuate=True,
language="en-US",
encoding="linear16",
smart_format=True,
channels=1,
sample_rate=8000,
interim_results=True,
utterance_end_ms="1000",
)
deepgram_conn.start(options)

f = sys.argv[1]
data, samplerate = sf.read(f)
# send chunks of 32 ms each.
assert samplerate == 8000
chunk_size = int(0.032 * samplerate)
for i in range(0, len(data), chunk_size):
    arr = np.array(data[i : i + chunk_size])
    int16_array = np.int16(arr * 32767)
    audio_bytes = int16_array.tobytes()
    deepgram_conn.send(audio_bytes)
    # sleep for 32 ms
    time.sleep(0.032)

deepgram_conn.finish()

If possible, please attach an example audio file to reproduce the issue.

code_and_input.zip

Answered by SandraRodgers

May 23, 2024

The engineers are currently working on a fix for this issue. It is still in testing so not going to be released quite yet but I will update when it has been released. I hope this issue isn't too much of a problem for you and you can be patient while we get this right. Thank you!

View full answer

team-deepgram · 2024-05-22T17:53:06Z

team-deepgram
May 22, 2024
Maintainer

Thanks for asking your question about Deepgram! If you didn't already include it in your post, please be sure to add as much detail as possible so we can assist you efficiently, such as:

The request_id if you have a question about your requests or transcription responses.
The features you used or the full api.deepgram.com URL you sent your request to, including parameters.
Any code snippets you can share.

0 replies

SandraRodgers · 2024-05-23T14:09:38Z

SandraRodgers
May 23, 2024
Maintainer

Thank you for your report. I'm currently sharing this information with our engineers and will respond later after I find out more information for you.

0 replies

SandraRodgers · 2024-05-23T14:16:28Z

SandraRodgers
May 23, 2024
Maintainer

The engineers are currently working on a fix for this issue. It is still in testing so not going to be released quite yet but I will update when it has been released. I hope this issue isn't too much of a problem for you and you can be patient while we get this right. Thank you!

1 reply

diousk Sep 25, 2024

@SandraRodgers @team-deepgram
Is this issue fixed?

I have the same problem and my issue was created 8 months ago.
https://github.com/orgs/deepgram/discussions/545

Please help update. Thanks

venusatuluri · 2024-06-04T23:30:11Z

venusatuluri
Jun 4, 2024
Author

Is the fix in production?

…

On Tue, Jun 4, 2024 at 4:22 PM John Vajda (JV) ***@***.***> wrote: Closed #777 <#777> as resolved. — Reply to this email directly, view it on GitHub <#777>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABHZO2GIXE56ITYBIHDIWKLZFZD43AVCNFSM6AAAAABIED6VMKVHI2DSMVQWIX3LMV45UABFIRUXGY3VONZWS33OIV3GK3TUHI5E433UNFTGSY3BORUW63R3GEZTCOBSHA4A> . You are receiving this because you authored the thread.Message ID: ***@***.*** com>

0 replies

classcard0 · 2024-06-21T05:42:35Z

classcard0
Jun 21, 2024

I also have the same problem.
The graph above was created with the nova engine, and the graph below was created with the nova-2 engine.
The start of number 5 Thanks is incorrect.

0 replies

jpvajda · 2024-09-25T16:05:21Z

jpvajda
Sep 25, 2024
Maintainer

Related issue: https://github.com/orgs/deepgram/discussions/545

For those following this thread, this is an issue Deepgram is still looking into, thank you for your patience.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deepgram

Timestamps returned with streaming transcription are wrong. #777

{{title}}

Replies: 6 comments 1 reply

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Deepgram

Timestamps returned with streaming transcription are wrong. #777

venusatuluri May 22, 2024

Which Deepgram product feature are you using?

Details

If you are making a request to the Deepgram API, what is the full Deepgram URL you are making a request to?

If you are making a request to the Deepgram API and have a request ID, please paste it below:

If possible, please attach your code or paste it into the text box.

If possible, please attach an example audio file to reproduce the issue.

Replies: 6 comments · 1 reply

team-deepgram May 22, 2024 Maintainer

SandraRodgers May 23, 2024 Maintainer

SandraRodgers May 23, 2024 Maintainer

diousk Sep 25, 2024

venusatuluri Jun 4, 2024 Author

classcard0 Jun 21, 2024

jpvajda Sep 25, 2024 Maintainer

venusatuluri
May 22, 2024

Replies: 6 comments 1 reply

team-deepgram
May 22, 2024
Maintainer

SandraRodgers
May 23, 2024
Maintainer

SandraRodgers
May 23, 2024
Maintainer

venusatuluri
Jun 4, 2024
Author

classcard0
Jun 21, 2024

jpvajda
Sep 25, 2024
Maintainer