You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
To start off, this is an awesome tool and the team has impressive work to get to this point.
I'm currently using MLServer in a high-throughput, low-latency system where we use gRPC to perform inferences. We have added an asynchronous capability into our inference client which sends many requests to the gRPC server at once (typically about 25). We have a timeout set on our client and we first started seeing a number of DEADLINE_EXCEEDED responses and I started to look into the model servers themselves to figure out why the server had started to exceed deadlines (we hadn't experienced this very often in the past) and it looks like the process loop is actually being restarted due messages being lost.
We see the following traceback:
2024-03-28 19:56:42,015 [mlserver.parallel] ERROR - Response processing loop crashed. Restarting the loop...
Traceback (most recent call last):
File "/usr/local/lib/python3.10/site-packages/mlserver/parallel/dispatcher.py", line 186, in _process_responses_cb
process_responses.result()
File "/usr/local/lib/python3.10/site-packages/mlserver/parallel/dispatcher.py", line 207, in _process_responses
self._async_responses.resolve(response)
File "/usr/local/lib/python3.10/site-packages/mlserver/parallel/dispatcher.py", line 102, in resolve
future = self._futures[message_id]
KeyError: 'cea95af0-859f-413a-a033-dfbe51e96c05'
where the dispatcher is trying to check on a given message, but it's lost.
^ once this error occurs once, all of the rest of our parallel inference requests fail with the same exception (different message_id obviously).
I took a look at the source code and it looks like when the process_response.result() is called, the logic has a blanket exception for anything that isnt an asyncio.CancelledError and assume that the process loop has crashed, so it restarts it by scheduling a new task, but it's not immediately clear (to me, at least) if this is really what should be happening. I don't see any signals from the server that the processing loop actually crashed - it just seems to be confused about which message its supposed to be getting.
As a note about our system set up, we have these deployed into Kubernetes (so is our client app) as a deployment with between 10-15 pods at any given time with environment variable MLSERVER_PARALLEL_WORKERS=16.
We are also using a grpc.aio.insecure_channel(server) pattern to manage the gRPC interactions on the client side.
The text was updated successfully, but these errors were encountered:
Yes. But how should I share it with you? The problem I encountered is this: after I sent a large number of requests, I could receive them at first, but then I couldn't receive them at all, and I checked the logs and found the same problem as above.
Hi MLServer -
To start off, this is an awesome tool and the team has impressive work to get to this point.
I'm currently using MLServer in a high-throughput, low-latency system where we use gRPC to perform inferences. We have added an asynchronous capability into our inference client which sends many requests to the gRPC server at once (typically about 25). We have a timeout set on our client and we first started seeing a number of
DEADLINE_EXCEEDED
responses and I started to look into the model servers themselves to figure out why the server had started to exceed deadlines (we hadn't experienced this very often in the past) and it looks like the process loop is actually being restarted due messages being lost.We see the following traceback:
where the dispatcher is trying to check on a given message, but it's lost.
^ once this error occurs once, all of the rest of our parallel inference requests fail with the same exception (different message_id obviously).
I took a look at the source code and it looks like when the
process_response.result()
is called, the logic has a blanket exception for anything that isnt anasyncio.CancelledError
and assume that the process loop has crashed, so it restarts it by scheduling a new task, but it's not immediately clear (to me, at least) if this is really what should be happening. I don't see any signals from the server that the processing loop actually crashed - it just seems to be confused about which message its supposed to be getting.As a note about our system set up, we have these deployed into Kubernetes (so is our client app) as a deployment with between 10-15 pods at any given time with environment variable
MLSERVER_PARALLEL_WORKERS=16
.We are also using a
grpc.aio.insecure_channel(server)
pattern to manage the gRPC interactions on the client side.The text was updated successfully, but these errors were encountered: