InferenceModel refactor for compile/export support #77

talmo · 2024-08-14T22:06:11Z

Overview

We want to be able to compile and export trained inference models for usage outside of sleap-nn.

Background

The logic for inference is broken down into:

Data loading: I/O (VideoReader, LabelsReader)
Data preprocessing: moving to GPU, normalization, batching, etc.
Model forward pass
Postprocessing: peak finding, PAF grouping, etc.

Right now, some of these ops are a bit mixed across Predictor classes and underlying torch.nn.Modules.

In order to best support workflows where we compile/export the final model for inference-only workloads, we need to include steps 2-4 in the inference model itself (as done in core SLEAP).

The reason for this is both for:

Performance: tensor vectorized ops like normalization are much faster on the GPU, and we won't incur overhead of transferring float32 data from CPU. Additionally, inference engines like torch.compile and TensorRT can yield dramatic performance improvements when the system supports it.
Portability: being able to run those ops with an exported artifact without having to ship instructions for pre/post-processing, including implementation-dependent details like we might have in sleap-nn. This will be useful for building web demos, realtime inference and more.

Ultralytics is a gold-standard example of this, where they support a huge number of export formats:

def export_formats():
    """YOLOv8 export formats."""
    import pandas  # scope for faster 'import ultralytics'

    x = [
        ["PyTorch", "-", ".pt", True, True],
        ["TorchScript", "torchscript", ".torchscript", True, True],
        ["ONNX", "onnx", ".onnx", True, True],
        ["OpenVINO", "openvino", "_openvino_model", True, False],
        ["TensorRT", "engine", ".engine", False, True],
        ["CoreML", "coreml", ".mlpackage", True, False],
        ["TensorFlow SavedModel", "saved_model", "_saved_model", True, True],
        ["TensorFlow GraphDef", "pb", ".pb", True, True],
        ["TensorFlow Lite", "tflite", ".tflite", True, False],
        ["TensorFlow Edge TPU", "edgetpu", "_edgetpu.tflite", True, False],
        ["TensorFlow.js", "tfjs", "_web_model", True, False],
        ["PaddlePaddle", "paddle", "_paddle_model", True, True],
        ["NCNN", "ncnn", "_ncnn_model", True, True],
    ]
    return pandas.DataFrame(x, columns=["Format", "Argument", "Suffix", "CPU", "GPU"])

(ref)

Some of these formats can implement more complex ops than others, which would fit our needs.

Our goal will be implement support for:

Required: TensorRT, ONNX
Nice to have: torch.compile, CoreML, TF SavedModel/GraphDef/Lite/JS, OpenVINO

Likely, we'll need to adapt to the nuances of each inference runtime framework (TensorRT is notoriously picky), which will impose a particular modularization of the inference steps above. Examples of potential pitfalls:

Not supporting variable length shapes (meaning we need to implement padding logic)
Not supporting autographable ops (TF)
Not supporting custom data types (e.g., Inference results data structures #46)
Not supporting cropping in the middle of the pipeline (e.g., top-down)

In cases where the framework does support everything, it may be that we need to do it in a particular way for the conversion to work (e.g., sometimes resizing ops support nearest neighbor but not bilinear interpolation mode).

Examples

Ultralytics

metrabs: SavedModel export lets you do something like this to use a trained model without installing any special dependencies:

import tensorflow as tf
import tensorflow_hub as tfhub

model = tfhub.load('https://bit.ly/metrabs_l')
image = tf.image.decode_jpeg(tf.io.read_file('img/test_image_3dpw.jpg'))
pred = model.detect_poses(image)

tfjs-model/pose-detection: Running the model in the browser in JavaScript via TF.JS
wonnx: WebGPU-based ONNX inference with hardware acceleration in the browser (with multiplatform support) -- without CUDA!

PRs

TODO

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

InferenceModel refactor for compile/export support #77

InferenceModel refactor for compile/export support #77

talmo commented Aug 14, 2024

InferenceModel refactor for compile/export support #77

InferenceModel refactor for compile/export support #77

Comments

talmo commented Aug 14, 2024

Overview

Background

Examples

PRs