[WIP] Add QNN EP HTP shared memory allocator #23136

edgchen1 · 2024-12-18T01:08:30Z

Description

Adds QNN EP HTP shared memory allocator.

The HTP shared memory allocator (HtpSharedMemoryAllocator) calls the rpcmem shared library (libcdsprpc.so/dll) to allocate and free memory that can be shared between HTP and CPU.

The allocator can be enabled by setting QNN EP option enable_htp_shared_memory_allocator to 1. QNNExecutionProvider::CreatePreferredAllocators() will then return an instance of HtpSharedMemoryAllocator.

For each QNN context, we also need to register and unregister memory handles in order to use the HTP shared memory. This memory handle management is added to QnnBackendManager, which also manages the QNN context handles.

For more information about using HTP shared memory with QNN, see: https://docs.qualcomm.com/bundle/publicresource/topics/80-63442-50/htp_shared_buffer_tutorial.html#shared-buffer-tutorial

Limitations:

HTP shared memory usage is only supported for graph inputs and outputs. Intermediate values are not supported.
An allocation is assigned to a single shared memory buffer. The allocator is not smart enough to have multiple allocations share a single shared memory buffer.

Motivation and Context

Improve performance by using HTP shared memory to avoid overhead from copying data between CPU and NPU.

…test

… declarations and definitions for IAllocator::TensorAlloc().

…ion clean up callback

github-actions

You can commit the suggested changes from lintrunner.

onnxruntime/core/providers/qnn/qnn_allocator.cc

edgchen1 · 2024-12-19T01:41:54Z

onnxruntime/core/providers/qnn/builder/qnn_backend_manager.h

@@ -240,6 +245,9 @@ class QnnBackendManager {
      const char* eventIdentifier);
 #endif

+  Status AddQnnContext(Qnn_ContextHandle_t context);
+  Status ReleaseQnnContextMemHandles();


delete old declaration

edgchen1 · 2024-12-19T02:02:01Z

onnxruntime/core/providers/qnn/builder/qnn_utils.cc

@@ -63,6 +65,12 @@ size_t GetElementSizeByType(ONNXTensorElementDataType elem_type) {
  return pos->second;
 }

+size_t GetQnnTensorDataSize(gsl::span<const uint32_t> shape, Qnn_DataType_t element_type) {
+  ORT_ENFORCE(!shape.empty(), "Empty shape not allowed.");  // TODO can we just treat empty shape as a scalar?


this check is copied from the original implementation here:

onnxruntime/onnxruntime/core/providers/qnn/builder/qnn_model.cc

Line 281 in 31e6e10

ORT_RETURN_IF(dims.empty(), "Tensor dimensions is nullptr");

I'm not sure if it's needed

edgchen1 and others added 30 commits November 5, 2024 15:12

save work

110a3bc

save work

0ba3a2f

add logging for setting QNN tensor memory, update comment

8436b14

add option to enable HTP shared memory allocator to onnxruntime_perf_…

c9826f4

…test

hack - try to cache mem handles in QnnModel

c07c35e

Remove duplicate include.

60dc837

hack, continued - move cache out to SharedContext

24e072f

Merge remote-tracking branch 'origin/main' into edgchen1/qnn_ep_rpcmem

e66cbef

move mem handle registration to allocator

8c515da

hook up some test code

18e2780

Merge remote-tracking branch 'origin/main' into edgchen1/qnn_ep_rpcmem

09ddce5

rename to RpcMemAllocator to HtpSharedMemoryAllocator

a65bb71

Merge remote-tracking branch 'origin/main' into edgchen1/qnn_ep_rpcmem

bfb135e

remove onnx protobuf dependency from allocator.h, add shared provider…

f179a0d

… declarations and definitions for IAllocator::TensorAlloc().

remove unused CPUAllocator::TensorAlloc declaration

7645ef4

Check for nullptr when trying to free

1043732

move mem handle management to QNN backend manager

022f4bc

remove IAllocator::TensorAlloc()

c527dee

document IAllocator::Free

e4f72b3

remove IAllocator__TensorAlloc

39ff901

Merge remote-tracking branch 'origin/main' into edgchen1/qnn_ep_rpcmem

1bed5a4

fix android build warning

d70db84

remove shared mem handles from shared context

45ef883

remove allocation clean up callback removal, use weak_ptrs in allocat…

d2e7b3c

…ion clean up callback

some clean up

c892c18

more clean up

b295eef

add helper to get qnn error message

13f5e30

use make_shared for QnnBackendManager

d5eace1

add test to qnn_basic_test.cc, document allocator parameter.

bacbcdc

Merge remote-tracking branch 'origin/main' into edgchen1/qnn_ep_rpcmem

30cd9ed

edgchen1 added 2 commits December 17, 2024 17:02

rename variables

b29ab61

revert changes to onnxruntime/test/providers/qnn/max_min_op_test.cc

67a54b8

github-actions bot reviewed Dec 18, 2024

View reviewed changes

onnxruntime/core/providers/qnn/qnn_allocator.cc Outdated Show resolved Hide resolved

github-advanced-security bot found potential problems Dec 18, 2024

View reviewed changes

onnxruntime/core/providers/qnn/qnn_allocator.cc Fixed Show fixed Hide fixed

jywu-msft requested a review from HectorSVC December 18, 2024 23:23

edgchen1 added 3 commits December 18, 2024 17:33

fix formatting

c0569e2

skip test if not android and not windows

dd45c84

update comment

959d8df

edgchen1 commented Dec 19, 2024

View reviewed changes

edgchen1 requested review from skottmckay, baijumeswani, adrianlizarraga and jywu-msft December 19, 2024 02:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[WIP] Add QNN EP HTP shared memory allocator #23136

[WIP] Add QNN EP HTP shared memory allocator #23136

edgchen1 commented Dec 18, 2024 •

edited

Loading

github-actions bot left a comment

edgchen1 Dec 19, 2024

edgchen1 Dec 19, 2024

[WIP] Add QNN EP HTP shared memory allocator #23136

Are you sure you want to change the base?

[WIP] Add QNN EP HTP shared memory allocator #23136

Conversation

edgchen1 commented Dec 18, 2024 • edited Loading

Description

Motivation and Context

github-actions bot left a comment

Choose a reason for hiding this comment

edgchen1 Dec 19, 2024

Choose a reason for hiding this comment

edgchen1 Dec 19, 2024

Choose a reason for hiding this comment

edgchen1 commented Dec 18, 2024 •

edited

Loading