Allow client to request subset of ensemble model outputs #763

Vincouux · 2024-07-24T12:06:24Z

Problem:

A common use case of ensemble model is preprocess -> inference -> postprocess. In most case, user will request last step output (postprocessed inference), as it might for example reduce networking between Triton & Application. But often, the application might also require the raw inference result (intermediate output). The current configuration & client interface (at least the Python one), seem to support requesting a subset or all the output (Python client takes a list of InferRequestedOutput). But for some reason, requesting a subset of all output results in InferenceServerException: [StatusCode.INVALID_ARGUMENT] in ensemble, [request id: ] unexpected deadlock, at least one output is not set while no more ensemble steps can be made.

Solution:

As of today, the only solution I found is to create 2 ensemble model (or actually N - 1 ensemble where N is the total number of steps):

preprocess -> inference
preprocess -> inference -> postprocess
Then, my client is either requesting output from first or second ensemble model. Not ideal as it introduce unnecessary complexity.

The text was updated successfully, but these errors were encountered:

dyastremsky · 2024-07-24T14:38:18Z

CC: @GuanLuo

Vincouux changed the title ~~Allow client to request subset of ensemble model output~~ Allow client to request subset of ensemble model outputs Jul 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow client to request subset of ensemble model outputs #763

Allow client to request subset of ensemble model outputs #763

Vincouux commented Jul 24, 2024

dyastremsky commented Jul 24, 2024

Allow client to request subset of ensemble model outputs #763

Allow client to request subset of ensemble model outputs #763

Comments

Vincouux commented Jul 24, 2024

Problem:

Solution:

dyastremsky commented Jul 24, 2024