Whisper onnxruntime exception on Android #633

GaryLaurenceauAva · 2024-03-04T13:42:46Z

Description

After invoking Whisper recognizer on Android multiple times consecutively, the following error is encountered:

[E:onnxruntime:, sequential_executor.cc:514 ExecuteKernel] Non-zero status code returned while running Expand node. Name:'/Expand' Status Message: invalid expand shape
Caught exception:
Non-zero status code returned while running Expand node. Name:'/Expand' Status Message: invalid expand shape
Return an empty result. Number of input frames: 300, Current tail paddings: 1000. If you see a lot of such exceptions, please consider using a larger --whisper-tail-paddings

Reproduction Steps:

val sampleRateInHz = 16000
val samples = FloatArray(sampleRateInHz * 3)
// feed samples with 3 seconds of audio then send it for decoding.
...
sherpaOnnxOffline.decode(samples, sampleRateInHz)

Environment Details:

Android target API 34
Device arch: arm64-v8a, armeabi-v7a
Whisper-base.en-int8 is used
SherpaOnnx v1.9.11

The text was updated successfully, but these errors were encountered:

csukuangfj · 2024-03-05T03:20:31Z

Could you describe how you created the APK?

Does it work with the APK from us?

GaryLaurenceauAva · 2024-03-05T09:30:47Z

No it crashes the same way. You can reproduce this error with SherpaOnnx2Pass app.
Use sherpa-onnx-whisper-base.en model.

csukuangfj · 2024-03-05T09:34:09Z

Device arch: arm64-v8a, armeabi-v7a

Just want to re-check that you are using sherpa-onnx 1.9.11, right?
That is, are you using this commit #617 ?

GaryLaurenceauAva · 2024-03-05T09:40:24Z

Yes I use version 1.9.11, but I've also tried with the last commit from master #5dc2eaf

GaryLaurenceauAva · 2024-03-05T09:41:59Z

Btw, this warning is printed sometime before this exception happened:
Overflow! n: 3200, size: 960000, n+size: 963200, capacity: 960000. Increase capacity to: 1920000

csukuangfj · 2024-03-05T09:43:07Z

Could you tell us whether you are using our apk, or write your own?

Also, are you using the code from #617?

Please give us more information.

GaryLaurenceauAva · 2024-03-05T09:53:08Z

Yes I use your example project SherpaOnnx2Pass with v1.9.11, so code from #617 is used.
Nothing is changed from your codebase. Just build your APK, and start recording an audio long enough and this error happens

csukuangfj · 2024-03-05T10:04:22Z

start recording an audio long enough

How long is this audio?

Are there pauses? If yes, how long is the pause?

GaryLaurenceauAva · 2024-03-05T10:10:25Z

Audio used is a 5 minutes conversation, with pauses.
But the error can happen after a couple of seconds or minutes, depending the phone

csukuangfj · 2024-03-05T10:11:54Z

Does it cause a crash or the APP can continue work?

GaryLaurenceauAva · 2024-03-05T10:14:07Z

No the app doesn't crash. It's an exception printed in the logs

csukuangfj · 2024-03-05T10:19:05Z

How many exceptions did you see? One or many?

GaryLaurenceauAva · 2024-03-05T10:27:32Z

There can be several per session

csukuangfj · 2024-03-05T10:34:42Z

Does it affect the final recognition result?

These logs are only available in logcat.

GaryLaurenceauAva · 2024-03-05T10:38:36Z

Yes because when this error occurs the call to decode may freeze for a long time (~10 to 30 seconds) and no results will be returned.

GaryLaurenceauAva · 2024-03-08T14:44:15Z

It's seams this happens when this condition offline-whisper-greedy-search-decoder.cc#L148 is not reached:

if (max_token_id == model_->EOT()) {
  break;
}

Then model_->ForwardDecoder() will be called too many times and will end up throwing an exception

csukuangfj · 2024-03-08T14:48:05Z

Non-zero status code returned while running Expand node. Name:'/Expand' Status Message: invalid expand shape
Return an empty result. Number of input frames: 300, Current tail paddings: 1000. If you see a lot of such exceptions, please consider using a larger --whisper-tail-paddings

Could you use a larger tail padding? You can always use 30000 to restore the 30 seconds constraint of the original whisper model.

GaryLaurenceauAva · 2024-03-08T15:21:46Z

With a padding of 30000 it works better, but then decoding is too long. I would like to keep a small padding

szaszakgy · 2024-05-07T12:18:04Z

Hello,

we have been experiencing the same errors described above with sherpa-onnx 1.9.23 in Ubuntu environment. We have observed the errors at variable frequency, 1-10 cases per 1000 utterances. They occur with various models (base/small/medium/large, int8/float), but mostly with English monolingual models. A part of problematic audio files seem to cause a timeout and do not get transcribed on the Huggingface API available under https://huggingface.co/spaces/k2-fsa/automatic-speech-recognition either. Monitoring the decoding (logging tokens in the for cycle around line 100 looping over text_ctx) we observe that repetitive patterns occur and the EoT is never reached until the cycle runs out. Sometimes the speaker makes indeed repetitions (like repeating a word many times, 5 times or above) sometimes there is no evident cause of getting stuck in a repetitive loop of tokens without reaching EoT. Increasing the tail paddings helped only a very little for us.

Through monitoring the predicted_tokens buffer, we attempted to detect if repetitions start to occur and force an EoT after a critical number of repetitions have occured. This efficiently prevented getting stuck and endig up in an empty transcript with all of our problem utterances. We were trying resetting instead of stopping, but the majority of utterances got stuck again and just some finished with success. The drawback of forcing EoT is that the tail of the audio will not be transcribed. But it helps at least preventing spending a lot of time in the for loop. I am wondering if a more advanced fix is available for the problem?

If there is an interest to review this small code extension, let me know, I am happy to create a PR on a fork or just copy-paste the changes since they are not abundant.

Thank you!

csukuangfj · 2024-05-07T12:24:39Z

The problem is introduced by our modification to whisper.

The original whisper needs 30 seconds of input. If the input is less than 30, then it is padded to 30, which means it has enough padding to detect the EoT.

In the current change, we remove the 30 seconds constraint and if the padding is small, then it may not abe able to detect the EoT. As described in #633 (comment), restore the original behavior by padding the input to 30 seconds can fix the issue, at the cost of extra computations for extra paddings.

szaszakgy · 2024-05-07T12:46:56Z

Actually, our problem is that despite using a padding of 30000, we are still getting invalid shape error and an empty transcript.
Here is the call:
sherpa-onnx-offline --whisper-tail-paddings=30000 --whisper-encoder=base.en-encoder.int8.onnx --whisper-decoder=base.en-decoder.int8.onnx base.en-tokens.txt --num-threads=1 --provider=cpu samples/problem_1.wav

I can see this utterance gets stuck in repeating tokens 11 and 607 alternately until n_text_ctx is reached with no EoT at the end at all.

csukuangfj · 2024-05-07T12:57:29Z

Can the official whisper implementation decode your problem_1.wav correctly?

szaszakgy · 2024-05-07T13:18:58Z

Yes, it can. (I have also double checked that sherpa-onnx versions match , I mean I am using the same version for decoding which was used during export.)

csukuangfj · 2024-05-07T13:38:11Z

Would you mind sharing.the wav file with.us?

szaszakgy · 2024-05-07T13:49:18Z

Unfortunately I can't, I am bound by severe data privacy constraints. It may however be of importance, that the majority of the utterances I deal with contain non-native English speech (the whisper model is not fine-tuned). I have so far a couple of problem utterances, as I wrote earlier, some indeed contain repetitions where decoding gets stuck at the repeated word. Others seem to be quite standard utterances.

szaszakgy · 2024-05-14T12:13:28Z

Hi @csukuangfj , I have a new sample producing the same error, I am able to share it with you by discarding third party public. Please let me know if you can accept the file and if so, the way I could send it to you. Thank you.

csukuangfj · 2024-05-15T07:44:31Z

Hi @csukuangfj , I have a new sample producing the same error, I am able to share it with you by discarding third party public. Please let me know if you can accept the file and if so, the way I could send it to you. Thank you.

Thanks!

Could you send the test wave to

csukuangfj at gmail dot com

csukuangfj linked a pull request Jun 20, 2024 that will close this issue

Fix whisper #1037

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Whisper onnxruntime exception on Android #633

Whisper onnxruntime exception on Android #633

GaryLaurenceauAva commented Mar 4, 2024

csukuangfj commented Mar 5, 2024

GaryLaurenceauAva commented Mar 5, 2024

csukuangfj commented Mar 5, 2024

GaryLaurenceauAva commented Mar 5, 2024

GaryLaurenceauAva commented Mar 5, 2024

csukuangfj commented Mar 5, 2024

GaryLaurenceauAva commented Mar 5, 2024

csukuangfj commented Mar 5, 2024

GaryLaurenceauAva commented Mar 5, 2024

csukuangfj commented Mar 5, 2024

GaryLaurenceauAva commented Mar 5, 2024

csukuangfj commented Mar 5, 2024

GaryLaurenceauAva commented Mar 5, 2024

csukuangfj commented Mar 5, 2024

GaryLaurenceauAva commented Mar 5, 2024

GaryLaurenceauAva commented Mar 8, 2024

csukuangfj commented Mar 8, 2024

GaryLaurenceauAva commented Mar 8, 2024

szaszakgy commented May 7, 2024 •

edited

Loading

csukuangfj commented May 7, 2024

szaszakgy commented May 7, 2024

csukuangfj commented May 7, 2024

szaszakgy commented May 7, 2024

csukuangfj commented May 7, 2024

szaszakgy commented May 7, 2024

szaszakgy commented May 14, 2024

csukuangfj commented May 15, 2024

Whisper onnxruntime exception on Android #633

Whisper onnxruntime exception on Android #633

Comments

GaryLaurenceauAva commented Mar 4, 2024

Description

Reproduction Steps:

csukuangfj commented Mar 5, 2024

GaryLaurenceauAva commented Mar 5, 2024

csukuangfj commented Mar 5, 2024

GaryLaurenceauAva commented Mar 5, 2024

GaryLaurenceauAva commented Mar 5, 2024

csukuangfj commented Mar 5, 2024

GaryLaurenceauAva commented Mar 5, 2024

csukuangfj commented Mar 5, 2024

GaryLaurenceauAva commented Mar 5, 2024

csukuangfj commented Mar 5, 2024

GaryLaurenceauAva commented Mar 5, 2024

csukuangfj commented Mar 5, 2024

GaryLaurenceauAva commented Mar 5, 2024

csukuangfj commented Mar 5, 2024

GaryLaurenceauAva commented Mar 5, 2024

GaryLaurenceauAva commented Mar 8, 2024

csukuangfj commented Mar 8, 2024

GaryLaurenceauAva commented Mar 8, 2024

szaszakgy commented May 7, 2024 • edited Loading

csukuangfj commented May 7, 2024

szaszakgy commented May 7, 2024

csukuangfj commented May 7, 2024

szaszakgy commented May 7, 2024

csukuangfj commented May 7, 2024

szaszakgy commented May 7, 2024

szaszakgy commented May 14, 2024

csukuangfj commented May 15, 2024

szaszakgy commented May 7, 2024 •

edited

Loading