-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Add QNN EP HTP shared memory allocator #23136
Draft
edgchen1
wants to merge
35
commits into
main
Choose a base branch
from
edgchen1/qnn_ep_rpcmem
base: main
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Changes from 32 commits
Commits
Show all changes
35 commits
Select commit
Hold shift + click to select a range
110a3bc
save work
edgchen1 0ba3a2f
save work
edgchen1 8436b14
add logging for setting QNN tensor memory, update comment
edgchen1 c9826f4
add option to enable HTP shared memory allocator to onnxruntime_perf_…
edgchen1 c07c35e
hack - try to cache mem handles in QnnModel
edgchen1 60dc837
Remove duplicate include.
edgchen1 24e072f
hack, continued - move cache out to SharedContext
edgchen1 e66cbef
Merge remote-tracking branch 'origin/main' into edgchen1/qnn_ep_rpcmem
edgchen1 8c515da
move mem handle registration to allocator
edgchen1 18e2780
hook up some test code
edgchen1 09ddce5
Merge remote-tracking branch 'origin/main' into edgchen1/qnn_ep_rpcmem
edgchen1 a65bb71
rename to RpcMemAllocator to HtpSharedMemoryAllocator
edgchen1 bfb135e
Merge remote-tracking branch 'origin/main' into edgchen1/qnn_ep_rpcmem
edgchen1 f179a0d
remove onnx protobuf dependency from allocator.h, add shared provider…
edgchen1 7645ef4
remove unused CPUAllocator::TensorAlloc declaration
edgchen1 1043732
Check for nullptr when trying to free
baijumeswani 022f4bc
move mem handle management to QNN backend manager
edgchen1 c527dee
remove IAllocator::TensorAlloc()
edgchen1 e4f72b3
document IAllocator::Free
edgchen1 39ff901
remove IAllocator__TensorAlloc
edgchen1 1bed5a4
Merge remote-tracking branch 'origin/main' into edgchen1/qnn_ep_rpcmem
edgchen1 d70db84
fix android build warning
edgchen1 45ef883
remove shared mem handles from shared context
edgchen1 d2e7b3c
remove allocation clean up callback removal, use weak_ptrs in allocat…
edgchen1 c892c18
some clean up
edgchen1 b295eef
more clean up
edgchen1 13f5e30
add helper to get qnn error message
edgchen1 d5eace1
use make_shared for QnnBackendManager
edgchen1 bacbcdc
add test to qnn_basic_test.cc, document allocator parameter.
edgchen1 30cd9ed
Merge remote-tracking branch 'origin/main' into edgchen1/qnn_ep_rpcmem
edgchen1 b29ab61
rename variables
edgchen1 67a54b8
revert changes to onnxruntime/test/providers/qnn/max_min_op_test.cc
edgchen1 c0569e2
fix formatting
edgchen1 dd45c84
skip test if not android and not windows
edgchen1 959d8df
update comment
edgchen1 File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
126 changes: 126 additions & 0 deletions
126
onnxruntime/core/providers/qnn/builder/qnn_context_mem_handle_manager.cc
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,126 @@ | ||
// Copyright (c) Microsoft Corporation. All rights reserved. | ||
// Licensed under the MIT License. | ||
|
||
#include "core/providers/qnn/builder/qnn_context_mem_handle_manager.h" | ||
|
||
#include "HTP/QnnHtpMem.h" | ||
|
||
#include "core/common/common.h" | ||
#include "core/providers/qnn/builder/qnn_def.h" | ||
#include "core/providers/qnn/builder/qnn_utils.h" | ||
#include "core/providers/qnn/qnn_allocator.h" | ||
|
||
namespace onnxruntime::qnn { | ||
|
||
QnnContextMemHandleManager::QnnContextMemHandleManager(const QNN_INTERFACE_VER_TYPE& qnn_interface, | ||
Qnn_ContextHandle_t context, | ||
const logging::Logger& logger) | ||
: qnn_interface_{qnn_interface}, | ||
context_{context}, | ||
logger_{logger} { | ||
} | ||
|
||
QnnContextMemHandleManager::~QnnContextMemHandleManager() { | ||
Clear(); | ||
} | ||
|
||
Status QnnContextMemHandleManager::GetOrRegister(void* shared_memory_address, const Qnn_Tensor_t& qnn_tensor, | ||
Qnn_MemHandle_t& qnn_mem_handle, bool& did_register) { | ||
const auto qnn_tensor_rank = GetQnnTensorRank(qnn_tensor); | ||
auto* const qnn_tensor_dims = GetQnnTensorDims(qnn_tensor); | ||
const auto qnn_tensor_data_type = GetQnnTensorDataType(qnn_tensor); | ||
|
||
const size_t qnn_tensor_data_size = | ||
utils::GetQnnTensorDataSize(gsl::span{qnn_tensor_dims, size_t{qnn_tensor_rank}}, qnn_tensor_data_type); | ||
|
||
{ | ||
std::scoped_lock g{mem_handles_mutex_}; | ||
|
||
// find existing mem handle | ||
if (const auto mem_handles_it = mem_handles_.find(shared_memory_address); | ||
mem_handles_it != mem_handles_.end()) { | ||
const auto& mem_handle_record = mem_handles_it->second; | ||
|
||
// check that actual tensor size is less than or equal to registered tensor size | ||
ORT_RETURN_IF_NOT(qnn_tensor_data_size <= mem_handle_record.registered_tensor_data_size, | ||
"Actual tensor data size (", qnn_tensor_data_size, | ||
") is larger than registered tensor data size (", mem_handle_record.registered_tensor_data_size, | ||
")."); | ||
|
||
qnn_mem_handle = mem_handle_record.mem_handle.get(); | ||
did_register = false; | ||
return Status::OK(); | ||
} | ||
|
||
// register a new mem handle | ||
HtpSharedMemoryAllocator::SharedMemoryInfo shared_memory_info{}; | ||
ORT_RETURN_IF_ERROR(HtpSharedMemoryAllocator::GetAllocationSharedMemoryInfo(shared_memory_address, | ||
shared_memory_info)); | ||
|
||
Qnn_MemDescriptor_t mem_descriptor{}; | ||
mem_descriptor.memShape.dimSize = qnn_tensor_dims; | ||
mem_descriptor.memShape.numDim = qnn_tensor_rank; | ||
mem_descriptor.memShape.shapeConfig = nullptr; | ||
mem_descriptor.dataType = qnn_tensor_data_type; | ||
mem_descriptor.memType = QNN_MEM_TYPE_CUSTOM; | ||
|
||
QnnMemHtp_Descriptor_t htp_mem_descriptor{}; | ||
htp_mem_descriptor.type = QNN_HTP_MEM_SHARED_BUFFER; | ||
htp_mem_descriptor.size = shared_memory_info.total_size; | ||
htp_mem_descriptor.sharedBufferConfig.fd = shared_memory_info.fd; | ||
htp_mem_descriptor.sharedBufferConfig.offset = shared_memory_info.offset; | ||
|
||
mem_descriptor.customInfo = &htp_mem_descriptor; | ||
|
||
LOGS(logger_, VERBOSE) << "Registering QNN mem handle for context: " << context_ | ||
<< ", shared memory (address: " << shared_memory_address | ||
<< ", offset: " << shared_memory_info.offset | ||
<< ", fd: " << shared_memory_info.fd | ||
<< ")"; | ||
|
||
Qnn_MemHandle_t raw_mem_handle{}; | ||
const auto register_result = qnn_interface_.memRegister(context_, &mem_descriptor, 1, &raw_mem_handle); | ||
ORT_RETURN_IF_NOT(register_result == QNN_SUCCESS, | ||
"qnn_interface.memRegister() failed: ", | ||
utils::GetVerboseQnnErrorMessage(qnn_interface_, register_result)); | ||
|
||
LOGS(logger_, VERBOSE) << "Registered QNN mem handle. mem_handle: " << raw_mem_handle; | ||
|
||
const auto unregister_mem_handle = [this](Qnn_MemHandle_t raw_mem_handle) { | ||
LOGS(logger_, VERBOSE) << "Unregistering QNN mem handle. mem_handle: " << raw_mem_handle; | ||
|
||
const auto unregister_result = qnn_interface_.memDeRegister(&raw_mem_handle, 1); | ||
if (unregister_result != QNN_SUCCESS) { | ||
LOGS(logger_, ERROR) << "qnn_interface.memDeRegister() failed: " | ||
<< utils::GetVerboseQnnErrorMessage(qnn_interface_, unregister_result); | ||
} | ||
}; | ||
|
||
UniqueQnnMemHandle mem_handle(raw_mem_handle, unregister_mem_handle); | ||
MemHandleRecord mem_handle_record{qnn_tensor_data_size, std::move(mem_handle)}; | ||
mem_handles_.emplace(shared_memory_address, std::move(mem_handle_record)); | ||
Check warning on line 101 in onnxruntime/core/providers/qnn/builder/qnn_context_mem_handle_manager.cc GitHub Actions / Optional Lint C++
|
||
|
||
qnn_mem_handle = raw_mem_handle; | ||
did_register = true; | ||
return Status::OK(); | ||
} | ||
} | ||
|
||
Status QnnContextMemHandleManager::Unregister(void* shared_memory_address) { | ||
std::scoped_lock g{mem_handles_mutex_}; | ||
|
||
auto mem_handles_it = mem_handles_.find(shared_memory_address); | ||
ORT_RETURN_IF_NOT(mem_handles_it != mem_handles_.end(), | ||
"No mem handle found for address (", shared_memory_address, ")."); | ||
|
||
mem_handles_.erase(mem_handles_it); | ||
|
||
return Status::OK(); | ||
} | ||
|
||
void QnnContextMemHandleManager::Clear() { | ||
std::scoped_lock g{mem_handles_mutex_}; | ||
mem_handles_.clear(); | ||
} | ||
|
||
} // namespace onnxruntime::qnn |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
delete old declaration