You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am working on a project to serve an encoder based model in a Triton Inference Server ensemble. The nodes will be a preprocessing node which feeds directly into an encoder (generate an embedding feature from roberta-base) and then a fan out to K light weight classification head ( think N linear layers).
How far can I reasonably push K? Would the ensemble orchestrator be capable of handling inference at K=100 classifiers?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hello,
I am working on a project to serve an encoder based model in a Triton Inference Server ensemble. The nodes will be a preprocessing node which feeds directly into an encoder (generate an embedding feature from roberta-base) and then a fan out to K light weight classification head ( think N linear layers).
How far can I reasonably push K? Would the ensemble orchestrator be capable of handling inference at K=100 classifiers?
Beta Was this translation helpful? Give feedback.
All reactions