Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issue when converting Whisper using --collect_cross_qk on CPU #18216

Closed
axelman03 opened this issue Nov 1, 2023 · 5 comments
Closed

Issue when converting Whisper using --collect_cross_qk on CPU #18216

axelman03 opened this issue Nov 1, 2023 · 5 comments
Labels
core runtime issues related to core runtime platform:windows issues related to the Windows platform

Comments

@axelman03
Copy link

Describe the issue

I am currently using the nightly build of the ONNX runtime to convert Whisper to ONNX. I am specifically interested in getting the cross QK of the model, to be used eventually for timestamps. I am trying to convert the model to run on CPU. My issue is that, when I convert it, a runtime exception occurs:

An error occurred while trying to verify parity between PyTorch and ONNX Runtime: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running WhisperBeamSearch node. Name:'BeamSearch_zcode' Status Message: C:\a\_work\1\s\onnxruntime\contrib_ops/cpu/transformers/beam_search_impl_whisper.h:300 onnxruntime::contrib::transformers::BeamSearchWhisper<float>::Execute decoder_subgraph_.has_decoder_masked_attention_ was false. decoder subgraph: output_cross_qk could only work with has_decoder_masked_attention

Traceback (most recent call last):
  File "C:\git\onnx-rt\1.17.0-pre\onnxruntime\onnxruntime\python\tools\transformers\models\whisper\convert_to_onnx.py", line 481, in main
    max_diff = WhisperHelper.verify_onnx(args.model_name_or_path, ort_session, device)
  File "C:\git\onnx-rt\1.17.0-pre\onnxruntime\onnxruntime\python\tools\transformers\models\whisper\whisper_helper.py", line 338, in verify_onnx
    ort_outputs = ort_session.run(None, inputs)[0][0]
  File "C:\Users\alexander.bolejack\Anaconda3\envs\S2T\lib\site-packages\onnxruntime\capi\onnxruntime_inference_collection.py", line 220, in run
    return self._sess.run(output_names, input_feed, run_options)
onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running WhisperBeamSearch node. Name:'BeamSearch_zcode' Status Message: 
C:\a\_work\1\s\onnxruntime\contrib_ops/cpu/transformers/beam_search_impl_whisper.h:300 onnxruntime::contrib::transformers::BeamSearchWhisper<float>::Execute decoder_subgraph_.has_decoder_masked_attention_ was false. decoder subgraph: output_cross_qk could only work with has_decoder_masked_attention

When looking into it, it seems that DecoderMaskedMultiHeadAttention is only used when the --use_gpu flag is enabled, as is cross QK.

Is there any way I can build and run this model on the CPU?

To reproduce

Here is the exact command I ran. Use the latest version of ONNX Runtime.

python -m models.whisper.convert_to_onnx -m openai/whisper-base --output whisperbase-timestamps --use_external_data_format --precision int8 --quantize_embedding_layer --extra_decoding_ids --output_sequence_scores --overwrite --output_no_speech_probs --output_cross_qk --collect_cross_qk --use_whisper_beamsearch --provider cpu

Urgency

No response

Platform

Windows

OS Version

Windows 10

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

#18206

ONNX Runtime API

Other / Unknown

Architecture

X64

Execution Provider

Default CPU

Execution Provider Library Version

No response

@github-actions github-actions bot added the platform:windows issues related to the Windows platform label Nov 1, 2023
@yuslepukhin yuslepukhin added the core runtime issues related to core runtime label Nov 1, 2023
Copy link
Contributor

github-actions bot commented Dec 2, 2023

This issue has been automatically marked as stale due to inactivity and will be closed in 7 days if no further activity occurs. If further support is needed, please provide an update and/or more details.

@github-actions github-actions bot added the stale issues that have not been addressed in a while; categorized by a bot label Dec 2, 2023
@thiagocrepaldi
Copy link
Contributor

@shubhambhokare1 @kunal-vaishnavi maybe this is something you are familiar with

@github-actions github-actions bot removed the stale issues that have not been addressed in a while; categorized by a bot label Dec 5, 2023
@axelman03
Copy link
Author

I was trying to look a little further into it. It seems that Cross QK support requires the model to be compiled with DecoderMaskedMultiHeadAttention, which is implemented only for CUDA.
I am not familiar with it enough to say if DecoderMaskedMultiHeadAttention needs to be used for cross QK, or if it can be brought to CPU/other execution providers though.

@kunal-vaishnavi
Copy link
Contributor

It seems that Cross QK support requires the model to be compiled with DecoderMaskedMultiHeadAttention, which is implemented only for CUDA.

Cross QK support is added in this PR as part of the WhisperBeamSearch op and DecoderMaskedMultiHeadAttention op. The feature is currently supported on CUDA and will be supported on CPU in the future.

I am not familiar with it enough to say if DecoderMaskedMultiHeadAttention needs to be used for cross QK, or if it can be brought to CPU/other execution providers though.

DecoderMaskedMultiHeadAttention is used specifically for CUDA. Cross QK support can be added to other attention ops that run on CPU.

Copy link
Contributor

github-actions bot commented Jan 5, 2024

This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.

@github-actions github-actions bot added the stale issues that have not been addressed in a while; categorized by a bot label Jan 5, 2024
@natke natke removed the stale issues that have not been addressed in a while; categorized by a bot label Jan 10, 2024
@axelman03 axelman03 closed this as not planned Won't fix, can't repro, duplicate, stale Jan 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
core runtime issues related to core runtime platform:windows issues related to the Windows platform
Projects
None yet
Development

No branches or pull requests

5 participants