Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Mobile] QNN-EP graph preparation failed #21800

Closed
edupuis-psee opened this issue Aug 20, 2024 · 3 comments
Closed

[Mobile] QNN-EP graph preparation failed #21800

edupuis-psee opened this issue Aug 20, 2024 · 3 comments
Assignees
Labels
ep:QNN issues related to QNN exeution provider platform:mobile issues related to ONNX Runtime mobile; typically submitted using template

Comments

@edupuis-psee
Copy link

Describe the issue

I'm struggling with the inference of an ONNX model using QNN EP on an android device, the graph preparation fails with the following trace:

[2022-06-15 22:54:47.876] [trace] graph_prepare.cc:204:ERROR:could not create op: q::flat_from_vtcm
[2022-06-15 22:54:47.876] [trace] graph_prepare.cc:1377:ERROR:Op 0x102b4800000023 preparation failed with err:-1
[2022-06-15 22:54:47.876] [trace]  <E> "GridSample" generated: could not create op
[2022-06-15 22:54:47.876] [trace]  <E> RouterFastRPC graph prepare failed 12
[2022-06-15 22:54:47.876] [trace]  <V> Async property not supported. Skipping register Async context
[2022-06-15 22:54:47.876] [trace]  <E> Failed to finalize graph (id: 1) with err 1002
[2022-06-15 22:54:47.876] [trace]  <V> Wake up free backend (id: 1)'s thread(s)
[2022-06-15 22:54:47.876] [trace]  <I> QnnGraph_finalize done. status 0x3ea
[2022-06-15 22:54:47.876] [error] Failed to finalize QNN graph.

The model:
image

The QNN EP config:

    std::unordered_map<std::string, std::string> qnn_options;
    qnn_options["backend_path"] = "libQnnHtp.so";
    qnn_options["profiling_level"] = "basic";
    qnn_options["profiling_file_path"] = qnn_profiling.string();
    qnn_options["htp_graph_finalization_optimization_mode"] = "3";
    qnn_options["htp_performance_mode"] = "burst";
    qnn_options["rpc_control_latency"] = "100";
    qnn_options["htp_arch"] = "69";
    qnn_options["soc_model"] = "36";
    qnn_options["vtcm_mb"] = "8";
    qnn_options["qnn_context_priority"] = "high";

Tested with both QNN 2.24 & 25 and ORT 1.18.1 & 1.19

The shape is large (4k grids) so it might be a VTCM memory issue with the tiling, but I have no way to make sure of that. Does anyone know how can I check.

Interestingly enough, removing the multiplier and subtraction operation helps with this.

I wonder if there is any way to use another EP for this specific op but I'm very new to the use of EP and I haven't (yet) found the corresponding doc.

Thank you in advance

To reproduce

to obtain a minimalistic model that reproduce the issue:

import torch
import torch.nn as nn
import torch.onnx

# Define the model that includes grid_sample operation
class GridSampleModel(nn.Module):
    def forward(self, x, grid):
        return 0.5 * nn.functional.grid_sample(x * 3, grid - 0.5, mode='bilinear', padding_mode='zeros', align_corners=False)

# Create example tensors for the input and grid
x = torch.randn(1, 1, 720, 1280)  # Example input tensor (N, C, H, W)
grid = torch.randn(1, 3072, 4096, 2)  # Example grid tensor (N, H_out, W_out, 2)

# Initialize the model
model = GridSampleModel()

# Set the model to evaluation mode
model.eval()

# Path to save the ONNX model
onnx_path = "grid_sample_model.onnx"

# Export the model
torch.onnx.export(
    model,                        # model being run
    (x, grid),                    # model input (or a tuple for multiple inputs)
    onnx_path,                    # where to save the model (can be a file or file-like object)
    export_params=True,           # store the trained parameter weights inside the model file
    opset_version=16,             # the ONNX version to export the model to
    do_constant_folding=True,     # whether to execute constant folding for optimization
    input_names=['input', 'grid'],   # the model's input names
    output_names=['output'],      # the model's output names
)

onnx_path

Urgency

No response

Platform

Android

OS Version

12

ONNX Runtime Installation

Built from Source

Compiler Version (if 'Built from Source')

ndk26c

Package Name (if 'Released Package')

onnxruntime-android

ONNX Runtime Version or Commit ID

1.19.0

ONNX Runtime API

C++/C

Architecture

X64

Execution Provider

Other / Unknown

Execution Provider Library Version

qnn-v2.25.0.240728104910_97711

@edupuis-psee edupuis-psee added the platform:mobile issues related to ONNX Runtime mobile; typically submitted using template label Aug 20, 2024
@github-actions github-actions bot added the ep:QNN issues related to QNN exeution provider label Aug 20, 2024
@HectorSVC
Copy link
Contributor

Are you trying to run fp32 model on HTP backend? Try set enable_htp_fp16_precision. HTP doesn't really support fp32, it's only for functionality verification.
Try with the settings below.
qnn_options["backend_path"] = "libQnnHtp.so";
qnn_options["profiling_level"] = "basic";
qnn_options["profiling_file_path"] = qnn_profiling.string();
qnn_options["htp_graph_finalization_optimization_mode"] = "3";
qnn_options["htp_performance_mode"] = "burst";
qnn_options["rpc_control_latency"] = "100";
qnn_options["soc_model"] = "36";
qnn_options["enable_htp_fp16_precision"] = "1";
qnn_options["qnn_context_priority"] = "high";

I don't have device with soc_model=36. But I tried the offline context binary generation. It worked You can try that also. on Linux x86 or Win32 system, run command below (Windows x86 for example):
onnxruntime_perf_test.exe -n -e qnn -i "backend_path|QnnHtp.dll soc_model|36 htp_graph_finalization_optimization_mode|3 enable_htp_fp16_precision|1" -C "ep.context_enable|1" -m times -r 1 -I .\QNN_issues\grid_sample_model.onnx
It will create a Onnx model with QNN context binary inside grid_sample_model.onnx_ctx.onnx. You can run that model on the device. Attached the one I generated.
grid_sample_model.onnx_ctx.zip

@HectorSVC
Copy link
Contributor

@edupuis-psee Any updates?

@edupuis-psee
Copy link
Author

Thank you for your help, I was indeed capable of inferring the model on device thanks to the fp16 precision.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ep:QNN issues related to QNN exeution provider platform:mobile issues related to ONNX Runtime mobile; typically submitted using template
Projects
None yet
Development

No branches or pull requests

2 participants