Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inference Layer by Layer or feature extraction on Onnx Runtime #19954

Open
IzanCatalan opened this issue Mar 16, 2024 · 2 comments
Open

Inference Layer by Layer or feature extraction on Onnx Runtime #19954

IzanCatalan opened this issue Mar 16, 2024 · 2 comments
Labels
stale issues that have not been addressed in a while; categorized by a bot

Comments

@IzanCatalan
Copy link

Describe the issue

Hi everyone, I would like to know if performing a layer-per-layer inference on Onnx Runtime with a pre-trained model (in fp32 or int8 datatypes) is possible.

My idea is to use several fp32 and int8-quantized models from ONNX Model Zoo Repo and then do the inference layer by layer to achieve a feature extraction. After this, I would modify the outputs from each layer, and I would use them as a new input for the following layers.

The approximate code would be something similar to this one:

model_path = "model.onnx"
ort_session = ort.InferenceSession(model_path)

input_data = np.random.randn(1, 3, 32, 32).astype(np.float32)

conv1_output = ort_session.run(None, {'input1': input_data})[0]

conv2_output = ort_session.run(None, {'input2': conv1_output})[0]

# Now, I can work with intermediate outputs, modify them and use them as new inputs

However, I tried to reproduce this code with a resnet50 pre-trained model from ONNX Model Zoo Repo, but it seems this model, like the rest of pre-trained models, only has one input and one output (no way of accessing to intermediate outputs.

So, is there any way I could do this?

Thank you!

To reproduce

I am running onnxruntime build from source for cuda 11.2, GCC 9.5, cmake 3.27 and python 3.8 with ubuntu 20.04.

Urgency

No response

Platform

Linux

OS Version

20.04

ONNX Runtime Installation

Built from Source

ONNX Runtime Version or Commit ID

onnxruntime-gpu1.12.0

ONNX Runtime API

Python

Architecture

X64

Execution Provider

CUDA

Execution Provider Library Version

Cuda 11.2

@github-actions github-actions bot added the ep:CUDA issues related to the CUDA execution provider label Mar 16, 2024
@hariharans29
Copy link
Member

No, ORT does not support this scenario. Each "session" conceptually maps to an entire model, not a portion of the model.

To achieve what you want, you would have to break-up the layers that you are interested in each model into separate models and chain them together like the sample code you pasted.

Hope this helps.

@sophies927 sophies927 removed the ep:CUDA issues related to the CUDA execution provider label Mar 21, 2024
Copy link
Contributor

This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.

@github-actions github-actions bot added the stale issues that have not been addressed in a while; categorized by a bot label Apr 21, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale issues that have not been addressed in a while; categorized by a bot
Projects
None yet
Development

No branches or pull requests

3 participants