-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable user to set QNN HTP performance mode for every session run #19521
Conversation
… for each thread as default so user don't need to set it for every session run
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The non-QNN changes look fine.
@zhangsibo1129 , @FFFrog , could you help to take a look at the changes in StreamExecutionContext, I hope it doesn't impact the CANN EP. |
Thank you for mentioning it. I am on vacation and will give you feedback in time after reading this tomorrow. |
python lint/format check is failing (not sure why it says python when the output is on basic_test.cc) #Resolved |
@HectorSVC,Everything is ok for CANN EP, thank you again. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, just a note that, DCVS can be enabled for power-efficient modes as QNN docs suggests. |
Description
Currently, the QNN HTP performance mode is set during session creation, there's no way to change it afterwards. There's requirement to set it high performance mode for high priority request and set it back to low performance mode later to save the power when the incoming request is idle for example.
Now, still keeps the performance mode at the session level in QNN EP options which is used at the default one. Ort QNN EP will set it once if user set it.
And there are setting (qnn.htp_perf_mode and qnn.htp_perf_mode_post_run) in run option to change the performance mode before and after session run. There's recommended scenario that user set the mode to high performance mode before the the inference sun so that user can get the result back ASAP. And set the mode to low performance mode after the inference to save the power.