[BUG] Node-specific `ConnectTimeout` not caught properly during federated query #72

alyssadai · 2024-02-29T21:44:07Z

Is there an existing issue for this?

I have searched the existing issues

Expected Behavior

When one of the nodes that a federated query is sent to times out, we expect a successful f-API response to still be returned which includes results for the nodes for which the query succeeded, and simply wrap any timeouts/errors in the "errors" field in the f-API response.

Current Behavior

When the connection to a node times out with a ConnectTimeout, the f-API errors out with an internal server error 😢

Error message

Error from f-API logs:

INFO:     172.19.10.1:48332 - "GET /query/?is_control=true&image_modal=https%3A%2F%2Fapi-bic.neurobagel.org%2F HTTP/1.0" 500 Internal Server Error
ERROR:    Exception in ASGI application
Traceback (most recent call last):
  File "/usr/local/lib/python3.10/site-packages/uvicorn/protocols/http/h11_impl.py", line 407, in run_asgi
    result = await app(  # type: ignore[func-returns-value]
  File "/usr/local/lib/python3.10/site-packages/uvicorn/middleware/proxy_headers.py", line 78, in __call__
    return await self.app(scope, receive, send)
  File "/usr/local/lib/python3.10/site-packages/fastapi/applications.py", line 276, in __call__
    await super().__call__(scope, receive, send)
  File "/usr/local/lib/python3.10/site-packages/starlette/applications.py", line 122, in __call__
    await self.middleware_stack(scope, receive, send)
  File "/usr/local/lib/python3.10/site-packages/starlette/middleware/errors.py", line 184, in __call__
    raise exc
  File "/usr/local/lib/python3.10/site-packages/starlette/middleware/errors.py", line 162, in __call__
    await self.app(scope, receive, _send)
  File "/usr/local/lib/python3.10/site-packages/starlette/middleware/cors.py", line 83, in __call__
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 79, in __call__
    raise exc
  File "/usr/local/lib/python3.10/site-packages/starlette/middleware/exceptions.py", line 68, in __call__
    await self.app(scope, receive, sender)
  File "/usr/local/lib/python3.10/site-packages/fastapi/middleware/asyncexitstack.py", line 21, in __call__
    raise e
  File "/usr/local/lib/python3.10/site-packages/fastapi/middleware/asyncexitstack.py", line 18, in __call__
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.10/site-packages/starlette/routing.py", line 718, in __call__
    await route.handle(scope, receive, send)
  File "/usr/local/lib/python3.10/site-packages/starlette/routing.py", line 276, in handle
    await self.app(scope, receive, send)
  File "/usr/local/lib/python3.10/site-packages/starlette/routing.py", line 66, in app
    response = await func(request)
  File "/usr/local/lib/python3.10/site-packages/fastapi/routing.py", line 237, in app
    raw_response = await run_endpoint_function(
  File "/usr/local/lib/python3.10/site-packages/fastapi/routing.py", line 163, in run_endpoint_function
    return await dependant.call(**values)
  File "/usr/src/./app/api/routers/query.py", line 14, in get_query
    response = await crud.get(
  File "/usr/src/./app/api/crud.py", line 94, in get
    for result in response:
TypeError: 'ConnectTimeout' object is not iterable

Environment

OS:
Python/Node version:

How to reproduce

No response

Anything else?

Most likely, this is because the category of request errors we're currently trying to catch (which we hoped would capture timeouts) is too narrow - it includes ConnectError, but not ConnectTimeout: see https://www.python-httpx.org/exceptions/#the-exception-hierarchy

federation-api/app/api/utility.py

Lines 241 to 245 in 3e76fd0

    
           except httpx.NetworkError as exc: 
        
               raise HTTPException( 
        
                   status_code=status.HTTP_503_SERVICE_UNAVAILABLE, 
        
                   detail=f"Request failed due to a network error or because the node API cannot be reached: {exc}", 
        
               ) from exc

We should broaden this to RequestError and see if that fixes things.

The text was updated successfully, but these errors were encountered:

surchs · 2024-04-11T19:43:40Z

🚀 Issue was released in v0.1.0 🚀

alyssadai added bug:functional labels Feb 29, 2024

surchs added this to Neurobagel Feb 29, 2024

alyssadai added flag:schedule Flag issue that should go on the roadmap or backlog. importance:high labels Feb 29, 2024

surchs moved this to Backlog in Neurobagel Mar 1, 2024

surchs removed the flag:schedule Flag issue that should go on the roadmap or backlog. label Mar 1, 2024

alyssadai moved this from Backlog to Specify - Done in Neurobagel Mar 6, 2024

alyssadai moved this from Specify - Done to Implement - Active in Neurobagel Mar 6, 2024

alyssadai self-assigned this Mar 6, 2024

alyssadai moved this from Implement - Active to Implement - Track in Neurobagel Mar 8, 2024

alyssadai mentioned this issue Mar 12, 2024

[FIX] Expanded error catching during API request federation #74

Merged

7 tasks

alyssadai moved this from Implement - Track to Implement - Done in Neurobagel Mar 12, 2024

surchs moved this from Implement - Done to Review - Active in Neurobagel Mar 15, 2024

alyssadai closed this as completed in #74 Mar 18, 2024

github-project-automation bot moved this from Review - Active to Review - Done in Neurobagel Mar 18, 2024

surchs added the released This issue/pull request has been released. label Apr 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Node-specific `ConnectTimeout` not caught properly during federated query #72

[BUG] Node-specific `ConnectTimeout` not caught properly during federated query #72

alyssadai commented Feb 29, 2024

surchs commented Apr 11, 2024

[BUG] Node-specific ConnectTimeout not caught properly during federated query #72

[BUG] Node-specific ConnectTimeout not caught properly during federated query #72

Comments

alyssadai commented Feb 29, 2024

Is there an existing issue for this?

Expected Behavior

Current Behavior

Error message

Environment

How to reproduce

Anything else?

surchs commented Apr 11, 2024

[BUG] Node-specific `ConnectTimeout` not caught properly during federated query #72

[BUG] Node-specific `ConnectTimeout` not caught properly during federated query #72