Issue running a model in ONNXruntime #21571

nachoogriis · 2024-07-31T09:44:10Z

Describe the issue

I have generated an onnx model from the yolo_nas_pose nano model which can be found here. In order to generate the onnx file I have added the following code cell.

yolo_nas_pose = models.get("yolo_nas_pose_n", pretrained_weights="coco_pose")
yolo_nas_pose.export(
    'yolo_nas_pose_n_8_256_128.onnx',
    preprocessing=True,
    postprocessing=True,
    confidence_threshold=0.2,
    num_pre_nms_predictions=1024,
    max_predictions_per_image=120,
    input_image_shape = [256, 128],
    batch_size=8,
    # quantiztion_mode
)

Besides this, I have merged this model with the following (simple) postprocessing model, which just takes the first out of the 120 predictions the model does for each of the images in the batch. I will paste here the tensorflow code used to generate the onnx file.

def create_post_processing_patches_tf():

    @tf.function
    def post_processing_patches_f(num_predictions, boxes, scores, joints):
        filtered_bboxes = tf.reshape(tf.gather(boxes, indices=[0], axis=1), shape=(8, 1, 4))
        filtered_scores = tf.reshape(tf.gather(scores, indices=[0], axis=1), shape=(8, 1))
        filtered_joints = tf.reshape(tf.gather(joints, indices=[0], axis=1), shape=(8, 1, 17, 3))


        return {
            'N': num_predictions,
            'filtered_bboxes': filtered_bboxes,
            'filtered_scores': filtered_scores,
            'filtered_joints': filtered_joints
        }
    
    input_signature = [
        tf.TensorSpec(shape=(8,1), dtype=tf.int64),
        tf.TensorSpec(shape=(8,120,4), dtype=tf.float32),
        tf.TensorSpec(shape=(8,120), dtype=tf.float32),
        tf.TensorSpec(shape=(8,120,17,3), dtype=tf.float32)
    ]

    return post_processing_patches_f, input_signature

def create_post_processing_patches_onnx(
    post_processing_patches_tf_f, input_signature,
    output_file = MODELS_FOLDER.joinpath('post_processing_patches.onnx')):
    
    onnx_model, external_tensor_storage = tf2onnx.convert.from_function(post_processing_patches_tf_f,
                    input_signature=input_signature, 
                    opset=17, custom_ops=None, # 
                    custom_op_handlers=None, custom_rewriter=None, inputs_as_nchw=None,
                    outputs_as_nchw=None, extra_opset=None, shape_override=None,
                    target=None, large_model=False, output_path=None)
    
    onnx.save(onnx_model, output_file)

When I run this model using onnxruntime on its own it works correctly. The problem arises when I merge this model with a previously created one. The inputs to the model are exactly of the same shape and data type. Besides, the problem occurs in one of the yolo_nas_pose's onnx nodes. The error I get is the following.

RuntimeException: [ONNXRuntimeError] : 6 : RUNTIME_EXCEPTION : Non-zero status code returned while running Concat node. Name:'posegraph2_/Concat' Status Message: /Users/runner/work/1/s/onnxruntime/core/framework/op_kernel.cc:83 virtual OrtValue *onnxruntime::OpKernelContext::OutputMLValue(int, const onnxruntime::TensorShape &) status.IsOK() was false. Shape mismatch attempting to re-use buffer. {4,3} != {6,3}. Validate usage of dim_value (values should be > 0) and dim_param (all values with the same string should equate to the same size) in shapes in the model.

Does anyone know what can be happening? I have tried a shape_inference, but the error still occurs.

To reproduce

To reproduce generate the two models stated above (necessary code is in the describe the issue section) and merge the models by using this code snippet.

full_pose_model = onnx.compose.merge_models(pose_model, post_processing_patches, [
    ('graph2_num_predictions','num_predictions'),
    ('graph2_post_nms_boxes', 'boxes'),
    ('graph2_post_nms_scores', 'scores'),
    ('graph2_post_nms_joints', 'joints')
    ], prefix2='pose_')

onnx.save(full_pose_model, directory_path.joinpath('full_pose_model.onnx'))

Then, concatenate this model with a previous model that can be merged with this one (inputs/outputs correspond).

Urgency

No response

Platform

Mac

OS Version

Sonoma 14.1

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

1.18.1

ONNX Runtime API

Python

Architecture

X64

Execution Provider

Default CPU

Execution Provider Library Version

No response

The text was updated successfully, but these errors were encountered:

wangyems · 2024-07-31T17:45:23Z

hi @nachoogriis, I'm not an expert on onnx.compose.merge_models() but since you mentioned that merging models causes ORT exception in the first model, it's suggested to check if the merging functionality ever change the graph of the model, especially the "concat" part.

A workaround for your case is to merge the models in the original framework and export them as a whole onnx model.

github-actions · 2024-08-31T15:00:47Z

This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.

github-actions bot added the stale issues that have not been addressed in a while; categorized by a bot label Aug 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue running a model in ONNXruntime #21571

Issue running a model in ONNXruntime #21571

nachoogriis commented Jul 31, 2024

wangyems commented Jul 31, 2024

github-actions bot commented Aug 31, 2024

Issue running a model in ONNXruntime #21571

Issue running a model in ONNXruntime #21571

Comments

nachoogriis commented Jul 31, 2024

Describe the issue

To reproduce

Urgency

Platform

OS Version

ONNX Runtime Installation

ONNX Runtime Version or Commit ID

ONNX Runtime API

Architecture

Execution Provider

Execution Provider Library Version

wangyems commented Jul 31, 2024

github-actions bot commented Aug 31, 2024