Skip to content

Commit

Permalink
[QNN EP] Disable early termination in GetCapability (microsoft#18140)
Browse files Browse the repository at this point in the history
[QNN EP] Disable early termination in GetCapability if there are
multiple partition and context binary enabled

### Description
QNN EP context binary cache feature only support single partition for
now. We have early termination in GetCapability.
After the PR microsoft#17764.
There's no Level 1 optimization any more for the 1st GetCapability.
Graph transformer EnsureUniqueDQForNodeUnit is not applied. So if
there's initializer -> DQ -> shared by multiple node unit. The node is
not identified as node unit group. QNN EP report many not supported
nodes because of this in the 1st GetCapability call.
The 2nd GetCapability still works normally.
Disable the early termination in GetCapability, delay the decision to
Compile.
  • Loading branch information
HectorSVC authored and kleiti committed Mar 22, 2024
1 parent 0a12a1f commit 730f86d
Showing 1 changed file with 2 additions and 6 deletions.
8 changes: 2 additions & 6 deletions onnxruntime/core/providers/qnn/qnn_execution_provider.cc
Original file line number Diff line number Diff line change
Expand Up @@ -404,10 +404,6 @@ QNNExecutionProvider::GetCapability(const onnxruntime::GraphViewer& graph_viewer
}
}

if (num_of_partitions > 1) {
ORT_ENFORCE(!context_cache_enabled_, "Only support single partition for context cache feature.");
}

const auto summary_msg = MakeString("Number of partitions supported by QNN EP: ", num_of_partitions,
", number of nodes in the graph: ", num_nodes_in_graph,
", number of nodes supported by QNN: ", num_of_supported_nodes);
Expand Down Expand Up @@ -485,7 +481,7 @@ Status QNNExecutionProvider::Compile(const std::vector<FusedNodeAndGraph>& fused

bool is_ctx_file_exist = qnn_cache_model_handler_->GetIsContextCacheFileExists();
if (is_qnn_ctx_model || (context_cache_enabled_ && is_ctx_file_exist)) {
ORT_ENFORCE(fused_nodes_and_graphs.size() == 1, "Only support single partition for context cache feature.");
ORT_RETURN_IF(fused_nodes_and_graphs.size() != 1, "Only support single partition for context cache feature.");
std::unique_ptr<qnn::QnnModel> qnn_model = std::make_unique<qnn::QnnModel>(logger, qnn_backend_manager_.get());
// Load and execute from cached context if exist
ORT_RETURN_IF_ERROR(qnn_cache_model_handler_->LoadQnnCtxFromOnnxModel(graph_viewer,
Expand All @@ -509,7 +505,7 @@ Status QNNExecutionProvider::Compile(const std::vector<FusedNodeAndGraph>& fused

ORT_RETURN_IF_ERROR(CompileFromOrtGraph(fused_nodes_and_graphs, node_compute_funcs, logger));
if (context_cache_enabled_ && !is_qnn_ctx_model) {
ORT_ENFORCE(fused_nodes_and_graphs.size() == 1, "Only support single partition for context cache feature.");
ORT_RETURN_IF(fused_nodes_and_graphs.size() != 1, "Only support single partition for context cache feature.");
uint64_t buffer_size(0);
auto context_buffer = qnn_backend_manager_->GetContextBinaryBuffer(buffer_size);
ORT_RETURN_IF_ERROR(qnn_cache_model_handler_->GenerateCtxCacheOnnxModel(context_buffer.get(),
Expand Down

0 comments on commit 730f86d

Please sign in to comment.