-
Notifications
You must be signed in to change notification settings - Fork 3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DistilBERT model inference failure using ONNX Runtime QNNExecutionProvider on Snapdragon® X Elite NPU #22532
Comments
When you generated the QDQ model, did you use
more details refer to https://onnxruntime.ai/docs/execution-providers/QNN-ExecutionProvider.html#running-a-model-with-qnn-eps-htp-backend-python You can also try the latest nightly build which has fp16 precision enabled by default. |
Hi @sean830314, Have you had a chance to follow the steps suggested by @HectorSVC? Could you please provide an update on your progress? Thank you. |
Thanks. HectorSVC I referred to the following link to modify the quantization code: https://onnxruntime.ai/docs/execution-providers/QNN-ExecutionProvider.html#running-a-model-with-qnn-eps-htp-backend-python. Latest Update: I followed the previous suggestions and made adjustments, but the warnings still persist during the quantization process. The same errors regarding tensor type inference continue to appear, with multiple layers being skipped during quantization. Environment: Python Version: 3.11.1 (amd64) qnq_quant.py code:
data_reader.py code:
Executed the quantization script using the command: python.exe .\qnq_quant.py --model_input .\distilbert-model\model.onnx --model_output model.qnq.onnx Error Messages:
|
Hi @sean830314, Thanks for reporting this issue. It seems the errors are related to node configuration validation failures during inference with the QNNExecutionProvider on the Snapdragon® X Elite NPU. Here are a few suggestions to troubleshoot:
Let us know if it helps. Thankyou |
Thank you for your suggestions. I attempted the first step, "Pre-processing Before Quantization," but encountered the following error message:
This error appears to be related to incomplete symbolic shape inference. I'm not sure if there are any additional settings or parameters that could prevent this error. Do you have any suggestions for resolving it? Thank you! |
Description: When running inference on the distilbert-base-uncased model using the NPU on Snapdragon® X Elite (X1E78100 - Qualcomm®) through ONNX Runtime's QNNExecutionProvider, the model fails to infer. However, the same model runs successfully when using the CPUExecutionProvider. The errors are related to node configuration validation failures within the ONNX model during inference.
Environment:
Device: Snapdragon® X Elite (X1E78100 - Qualcomm®)
ONNX Runtime Version: onnxruntime-qnn 1.19.0
Model: distilbert-base-uncased
Model Format: Optimized and quantized ONNX model (model_optimized_quantized.onnx)
Execution Provider: QNNExecutionProvider
Python Version: Python 3.10.11
OS: Windows 11
Code Snippet:
Error Logs:
The text was updated successfully, but these errors were encountered: