Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Build] Handling Multiple ONNX Runtime Sessions Sequentially in Docker #19309

Open
wadie999 opened this issue Jan 29, 2024 · 5 comments
Open
Labels
build build issues; typically submitted using template core runtime issues related to core runtime stale issues that have not been addressed in a while; categorized by a bot

Comments

@wadie999
Copy link

wadie999 commented Jan 29, 2024

Describe the issue

We have a Flask-based API for running computer vision models (YOLO and classifiers) using ONNX Runtime. The models, originally trained in PyTorch, are converted to ONNX format. In the local environment, the system performs well, allowing for the sequential loading and inference of different ONNX models. However, when deployed in Docker, we observe that only the first ONNX model loaded is available for inference, and additional inference sessions cannot be initiated concurrently.

The process flow involves:

  1. Loading the YOLO model in ONNX Runtime for initial inference.
    
  2. Cropping images based on YOLO output.
    
  3. Sending cropped images to various classifiers (also ONNX models) sequentially.
    

We suspect this might be a resource allocation or session management issue within the Docker environment. The primary question is whether implementing multi-threading within the Docker container could resolve this, and if so, how to approach this.

# Flask app initialization and route definition
# ...
@app.route("/predict", methods=["POST"])
def predict():
    # ...
    # Step 1: YOLO model to detect boxes
    yolo_response = yolo_predict(image_np)
    # ...
    for box in boxes:
        # Sequential processing of classifiers
        stonetype_result = stonetype_predict(resized_image)
        cut_result = cut_predict(resized_image)
        color_result = color_predict(resized_image)
        # ...
    return jsonify(results)
# ...

The models are being loaded for inference using :

def load_model(onnx_file_path):
    """Load the ONNX model."""
    session = ort.InferenceSession(onnx_file_path)
    return session

def infer(session, image_tensor):
    """Run model inference."""
    input_name = session.get_inputs()[0].name
    output = session.run(None, {input_name: image_tensor})
    return output

Expected Behavior:

Each model (YOLO and subsequent classifiers) should be loaded and run independently in their respective ONNX Runtime sessions within the Docker environment, similar to the local setup.

Observed Behavior:

Only the first model (YOLO) loaded in ONNX Runtime is available for inference. Subsequent attempts to load additional models for inference within the same Docker session are unsuccessful.

Build script

# Use an official Python runtime as a parent image
FROM python:3.10-slim

# Set the working directory in the container
WORKDIR /usr/src/app

# Copy the current directory contents into the container at /usr/src/app
COPY . /usr/src/app

# Install any needed packages specified in requirements.txt
RUN pip install -r requirements.txt

# Make port 5000 available to the world outside this container
EXPOSE 5000

# Define environment variable
ENV MODEL_PATH /usr/src/app/services/yolo_service/yolo.onnx

# Run server.py when the container launches
CMD ["python", "server.py"]

Error / output

[2024-01-29 13:03:20,899] ERROR in app: Exception on /predict [POST]
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 1463, in wsgi_app
response = self.full_dispatch_request()
File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 872, in full_dispatch_request
rv = self.handle_user_exception(e)
File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 870, in full_dispatch_request
rv = self.dispatch_request()
File "/usr/local/lib/python3.10/site-packages/flask/app.py", line 855, in dispatch_request
return self.ensure_sync(self.view_functions[rule.endpoint])(**view_args) # type: ignore[no-any-return]
File "/usr/src/app/server.py", line 38, in predict
stonetype_result = stonetype_predict(resized_image)
File "/usr/src/app/services/stonetype_service/app/server.py", line 35, in predict
output = sessionStoneType.run(None, {input_name: image_tensor})
File "/usr/local/lib/python3.10/site-packages/onnxruntime/capi/onnxruntime_inference_collection.py", line 220, in run
return self._sess.run(output_names, input_feed, run_options)
onnxruntime.capi.onnxruntime_pybind11_state.InvalidArgument: [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Got invalid dimensions for input: images for the following indices
index: 2 Got: 224 Expected: 640
index: 3 Got: 224 Expected: 640
Please fix either the inputs or the model.

@wadie999 wadie999 added the build build issues; typically submitted using template label Jan 29, 2024
@yufenglee
Copy link
Member

Is it model loading failure or inference failure? From the error message, it fails because input shape doesn't comply with model input. From your description, it sounds like some models fails to load.

@yufenglee yufenglee added the core runtime issues related to core runtime label Jan 29, 2024
@wadie999
Copy link
Author

wadie999 commented Jan 30, 2024

@yufenglee File "/usr/src/app/services/stonetype_service/app/server.py", line 35, in predict

YOLO model expects 640x640 size, and the other classifier models like stonetype, which is the first loaded for inference after YOLO, is expecting 224x224.
means only YOLO is loaded

@wadie999
Copy link
Author

Fix: To resolve this issue :

1.Environment Variables in Dockerfile: I introduced environment variables for each model directly within the Dockerfile. This ensures that the necessary configuration is available when the API runs inside a Docker container.

@wadie999 wadie999 reopened this Jan 31, 2024
@pranavsharma
Copy link
Contributor

Do you need any more clarification above what the error message says?

"onnxruntime.capi.onnxruntime_pybind11_state.InvalidArgument: [ONNXRuntimeError] : 2 : INVALID_ARGUMENT : Got invalid dimensions for input: images for the following indices"
index: 2 Got: 224 Expected: 640
index: 3 Got: 224 Expected: 640
Please fix either the inputs or the model.

Copy link
Contributor

github-actions bot commented Mar 4, 2024

This issue has been automatically marked as stale due to inactivity and will be closed in 30 days if no further activity occurs. If further support is needed, please provide an update and/or more details.

@github-actions github-actions bot added the stale issues that have not been addressed in a while; categorized by a bot label Mar 4, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
build build issues; typically submitted using template core runtime issues related to core runtime stale issues that have not been addressed in a while; categorized by a bot
Projects
None yet
Development

No branches or pull requests

3 participants