You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A common use case of ensemble model is preprocess -> inference -> postprocess. In most case, user will request last step output (postprocessed inference), as it might for example reduce networking between Triton & Application. But often, the application might also require the raw inference result (intermediate output). The current configuration & client interface (at least the Python one), seem to support requesting a subset or all the output (Python client takes a list of InferRequestedOutput). But for some reason, requesting a subset of all output results in InferenceServerException: [StatusCode.INVALID_ARGUMENT] in ensemble, [request id: ] unexpected deadlock, at least one output is not set while no more ensemble steps can be made.
Solution:
As of today, the only solution I found is to create 2 ensemble model (or actually N - 1 ensemble where N is the total number of steps):
preprocess -> inference
preprocess -> inference -> postprocess
Then, my client is either requesting output from first or second ensemble model. Not ideal as it introduce unnecessary complexity.
The text was updated successfully, but these errors were encountered:
Vincouux
changed the title
Allow client to request subset of ensemble model output
Allow client to request subset of ensemble model outputs
Jul 24, 2024
Problem:
A common use case of ensemble model is
preprocess -> inference -> postprocess
. In most case, user will request last step output (postprocessed inference), as it might for example reduce networking between Triton & Application. But often, the application might also require the raw inference result (intermediate output). The current configuration & client interface (at least the Python one), seem to support requesting a subset or all the output (Python client takes a list ofInferRequestedOutput
). But for some reason, requesting a subset of all output results inInferenceServerException: [StatusCode.INVALID_ARGUMENT] in ensemble, [request id: ] unexpected deadlock, at least one output is not set while no more ensemble steps can be made
.Solution:
As of today, the only solution I found is to create 2 ensemble model (or actually N - 1 ensemble where N is the total number of steps):
preprocess -> inference
preprocess -> inference -> postprocess
Then, my client is either requesting output from first or second ensemble model. Not ideal as it introduce unnecessary complexity.
The text was updated successfully, but these errors were encountered: