Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[QNN EP] Enable option to set QNN context priority #18315

Merged
merged 5 commits into from
Nov 9, 2023
Merged

Conversation

HectorSVC
Copy link
Contributor

@HectorSVC HectorSVC commented Nov 7, 2023

Enable option qnn_context_priority to set QNN context priority, options: "low", "normal", "normal_high", "high".

Description

Enable option qnn_context_priority to set QNN context priority, options: "low", "normal", "normal_high", "high".

This feature guarantees the model inference with higher priority. Tested with onnxruntime_perf_test tool using same model.

  1. Run the model on the NPU with single instance, the latency is 300ms.
  2. Run the same model on NPU with 2 instance at same time.
    Case 1:
    both with same priority (high ) -- latency is 600ms
    Case 2:
    1 with low priority -- latency is 30,000ms
    1 with high priority -- latency is 300ms
    Case 3:
    1 with normal priority -- latency is 15,000ms
    1 with high priority -- latency is 300ms

@HectorSVC HectorSVC merged commit 55c19d6 into main Nov 9, 2023
86 of 90 checks passed
@HectorSVC HectorSVC deleted the qnn_ctx_priority branch November 9, 2023 04:56
kleiti pushed a commit to kleiti/onnxruntime that referenced this pull request Mar 22, 2024
Enable option qnn_context_priority to set QNN context priority, options:
"low", "normal", "normal_high", "high".

### Description
Enable option qnn_context_priority to set QNN context priority, options:
"low", "normal", "normal_high", "high".

This feature guarantees the model inference with higher priority. Tested
with onnxruntime_perf_test tool using same model.
1. Run the model on the NPU with single instance, the latency is 300ms.
2. Run the same model on NPU with 2 instance at same time.
   Case 1:   
   both with same priority (high ) -- latency is 600ms
   Case 2:   
   1 with low priority -- latency is 30,000ms
   1 with high priority --  latency is 300ms
   Case 3:   
   1 with normal priority -- latency is 15,000ms
   1 with high priority --  latency is 300ms
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants