CANN error executing while running Conv node #18807

H3AlO3 · 2023-12-13T12:38:04Z

Describe the issue

I'm trying to run an onnx model on a Huawei cloud server that has an Ascend 310, but it reports the following error while model.run.

2023-12-13 20:11:34.351330063 [E:onnxruntime:Default, cann_call.cc:139 CannCall] CANN failure 500001: ACL_ERROR_FAILURE ; NPU=0 ; hostname=ecs-b520 ; expr=aclopCompileAndExecute(opname.c_str(), prepare.inputDesc_.size(), prepare.inputDesc_.data(), prepare.inputBuffers_.data(), prepare.outputDesc_.size(), prepare.outputDesc_.data(), prepare.outputBuffers_.data(), prepare.opAttr_, ACL_ENGINE_SYS, ACL_COMPILE_SYS, __null, Stream(ctx));
2023-12-13 20:11:34.351379137 [E:onnxruntime:, sequential_executor.cc:514 ExecuteKernel] Non-zero status code returned while running Conv node. Name:'/features/features.0/features.0.0/Conv' Status Message: CANN error executing aclopCompileAndExecute(opname.c_str(), prepare.inputDesc_.size(), prepare.inputDesc_.data(), prepare.inputBuffers_.data(), prepare.outputDesc_.size(), prepare.outputDesc_.data(), prepare.outputBuffers_.data(), prepare.opAttr_, ACL_ENGINE_SYS, ACL_COMPILE_SYS, NULL, Stream(ctx))
Traceback (most recent call last):
  File "error.py", line 27, in <module>
    print(predict())
  File "error.py", line 23, in predict
    preds = model.run(None, {"input.1": img})[0]
  File "/usr/local/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 217, in run
    return self._sess.run(output_names, input_feed, run_options)
onnxruntime.capi.onnxruntime_pybind11_state.Fail: [ONNXRuntimeError] : 1 : FAIL : Non-zero status code returned while running Conv node. Name:'/features/features.0/features.0.0/Conv' Status Message: CANN error executing aclopCompileAndExecute(opname.c_str(), prepare.inputDesc_.size(), prepare.inputDesc_.data(), prepare.inputBuffers_.data(), prepare.outputDesc_.size(), prepare.outputDesc_.data(), prepare.outputBuffers_.data(), prepare.opAttr_, ACL_ENGINE_SYS, ACL_COMPILE_SYS, NULL, Stream(ctx))

Specifically, I have the following configuration:

providers = [
    (
        "CANNExecutionProvider",
        {
            "device_id": 0,
            "arena_extend_strategy": "kNextPowerOfTwo",
            "enable_cann_graph": False,
        },
    ),
    "CPUExecutionProvider",
]

If I change the 'enable_cann_graph' setting to True, then it reports the following error

2023-12-13 20:10:22.459251611 [E:onnxruntime:, sequential_executor.cc:514 ExecuteKernel] Non-zero status code returned while running torch_jit_9127997404873590405_0 node. Name:'CANNExecutionProvider_torch_jit_9127997404873590405_0_0' Status Message: /root/onnxruntime/onnxruntime/core/providers/cann/cann_call.cc:143 bool onnxruntime::CannCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*) [with ERRTYPE = int; bool THRW = true] /root/onnxruntime/onnxruntime/core/providers/cann/cann_call.cc:137 bool onnxruntime::CannCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*) [with ERRTYPE = int; bool THRW = true] CANN failure -1: (look for ACL_ERROR_xxx in acl.h) ; NPU=0 ; hostname=ecs-b520 ; expr=ge::aclgrphBuildInitialize(options);


Traceback (most recent call last):
  File "error.py", line 27, in <module>
    print(predict())
  File "error.py", line 23, in predict
    preds = model.run(None, {"input.1": img})[0]
  File "/usr/local/lib/python3.8/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 217, in run
    return self._sess.run(output_names, input_feed, run_options)
onnxruntime.capi.onnxruntime_pybind11_state.RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running torch_jit_9127997404873590405_0 node. Name:'CANNExecutionProvider_torch_jit_9127997404873590405_0_0' Status Message: /root/onnxruntime/onnxruntime/core/providers/cann/cann_call.cc:143 bool onnxruntime::CannCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*) [with ERRTYPE = int; bool THRW = true] /root/onnxruntime/onnxruntime/core/providers/cann/cann_call.cc:137 bool onnxruntime::CannCall(ERRTYPE, const char*, const char*, ERRTYPE, const char*) [with ERRTYPE = int; bool THRW = true] CANN failure -1: (look for ACL_ERROR_xxx in acl.h) ; NPU=0 ; hostname=ecs-b520 ; expr=ge::aclgrphBuildInitialize(options);

There is no error while using CPUExecutionProvider.
Please help me, thanks!

To reproduce

my code

import onnxruntime as ort
import numpy as np

providers = [
    (
        "CANNExecutionProvider",
        {
            "device_id": 0,
            "arena_extend_strategy": "kNextPowerOfTwo",
            "enable_cann_graph": False,
            },
    ),
    "CPUExecutionProvider",
]
#providers = ["CPUExecutionProvider"]
model = ort.InferenceSession('model_0.onnx', providers=providers)


def predict():
    # fake image
    img = np.random.random((1, 3, 1024, 1024)).astype(np.float16)
    # inference
    preds = model.run(None, {"input.1": img})[0]
    return preds


print(predict())

and here is my model

Urgency

No response

Platform

Linux

OS Version

Ubuntu 18.04

ONNX Runtime Installation

Released Package

ONNX Runtime Version or Commit ID

1.15.1

ONNX Runtime API

Python

Architecture

X64

Execution Provider

CANN

Execution Provider Library Version

CANN 7.0.0

The text was updated successfully, but these errors were encountered:

H3AlO3 · 2023-12-13T14:01:53Z

I tried another model with a single Liner layer

class X(nn.Module):
    def __init__(self):
        super().__init__()
        self.linear = nn.Linear(2, 2)

    def forward(self, x):
        return self.linear(x)

and it works, but if you replace the Liner layer with Conv2d

class X(nn.Module):
    def __init__(self):
        super().__init__()
        self.linear = nn.Conv2d(2, 2, 2)

    def forward(self, x):
        return self.linear(x)

the error occurs again. So I think maybe there's an error with convolution?

H3AlO3 · 2023-12-15T10:19:49Z

It's a bug of cann-opp installing program, I have solved it.

kingsley-gl · 2024-03-07T08:17:36Z

It's a bug of cann-opp installing program, I have solved it.

I have met the same problem, how to slove it?

H3AlO3 · 2024-03-07T08:30:01Z

It's a bug of cann-opp installing program, I have solved it.

I have met the same problem, how to slove it?

I manually unzipped and copied some of the files from the opp installer, though I didn't actually solve the problem completely, it just turned into another error, so I finally gave up.

kingsley-gl · 2024-03-07T08:35:39Z

It's a bug of cann-opp installing program, I have solved it.

I have met the same problem, how to slove it?

I manually unzipped and copied some of the files from the opp installer, though I didn't actually solve the problem completely, it just turned into another error, so I finally gave up.

ok, thanks a lot

kingsley-gl · 2024-03-07T09:23:32Z

It's a bug of cann-opp installing program, I have solved it.

I have met the same problem, how to slove it?

I manually unzipped and copied some of the files from the opp installer, though I didn't actually solve the problem completely, it just turned into another error, so I finally gave up.

I solve it. You might open the ACL log by setting export ASCEND_SLOG_PRINT_TO_STDOUT=1 and adding the sess_opt.log_severity_level = 0 at your code to open the onnxruntime log. It will be catch the real error, such as no module 'tbe' and so on. The code would be well run after fixing it one by one.

github-actions bot added the ep:ACL issues related to ACL execution provider label Dec 13, 2023

H3AlO3 closed this as completed Dec 15, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

CANN error executing while running Conv node #18807

CANN error executing while running Conv node #18807

H3AlO3 commented Dec 13, 2023 •

edited

Loading

H3AlO3 commented Dec 13, 2023

H3AlO3 commented Dec 15, 2023

kingsley-gl commented Mar 7, 2024

H3AlO3 commented Mar 7, 2024

kingsley-gl commented Mar 7, 2024

kingsley-gl commented Mar 7, 2024

CANN error executing while running Conv node #18807

CANN error executing while running Conv node #18807

Comments

H3AlO3 commented Dec 13, 2023 • edited Loading

Describe the issue

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

H3AlO3 commented Dec 13, 2023

H3AlO3 commented Dec 15, 2023

kingsley-gl commented Mar 7, 2024

H3AlO3 commented Mar 7, 2024

kingsley-gl commented Mar 7, 2024

kingsley-gl commented Mar 7, 2024

H3AlO3 commented Dec 13, 2023 •

edited

Loading