You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I downloaded the Llama2-7b-onnx model from Hugging Face. Using the example code, it runs normally in both GPU and CPU environments (using CPUExecutionProvider and CUDAExecutionProvider).
However, when I replace the providers with CANNExecutionProvider and run it in the NPU environment, the following issues occur:
FP16 cannot run and basically hangs.
FP32 barely runs, but only uses 100+ MB of memory, and the CPU usage reaches over 6000%.
The providers I am using only include CANNExecutionProvider. From the observations, it seems that the NPU is not being used for inference, and instead, the CPU is being used (the inference time on the NPU is even longer than on the CPU).NPU usage is as shown in the image.
This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.
Describe the issue
I downloaded the Llama2-7b-onnx model from Hugging Face. Using the example code, it runs normally in both GPU and CPU environments (using CPUExecutionProvider and CUDAExecutionProvider).
However, when I replace the providers with CANNExecutionProvider and run it in the NPU environment, the following issues occur:
The providers I am using only include CANNExecutionProvider. From the observations, it seems that the NPU is not being used for inference, and instead, the CPU is being used (the inference time on the NPU is even longer than on the CPU).NPU usage is as shown in the image.
To reproduce
code same as:
https://huggingface.co/alpindale/Llama-2-7b-ONNX/tree/main
I just replace providers
Urgency
No response
Platform
Linux
OS Version
openEuler
ONNX Runtime Installation
Released Package
ONNX Runtime Version or Commit ID
onnxruntime-cann 1.18.0
ONNX Runtime API
Python
Architecture
ARM64
Execution Provider
Other / Unknown
Execution Provider Library Version
Ascend-cann-toolkit_8.0.RC2
The text was updated successfully, but these errors were encountered: