Skip to content

Why I still perform inference while triton server is not ready? #5441

Closed Answered by dyastremsky
ptran1203 asked this question in Q&A
Discussion options

You must be logged in to vote

In polling mode, model reloads should not result in loss of availability. You can read more here. The documentation also discusses how all new requests will be routed to the new model, assuming loading succeeds.

As far as the is_server_ready() flags, that is unexpected. The model shouldn't be become "not ready" during reloading. However, polling can be non-atomic in loading/unloading, which is why we recommend EXPLICIT mode for running in production. That gives you greater control over model behavior.

Can you try running the index API to see what models are listed as ready or not ready? You may also be able to look at the verbose logs for context. This will help make sure that what's bein…

Replies: 2 comments

Comment options

You must be logged in to vote
0 replies
Answer selected by ptran1203
Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
2 participants
Converted from issue

This discussion was converted from issue #5438 on March 01, 2023 19:01.