Better exception if scheduler disconnects from client #8690

fjetter · 2024-06-12T11:28:21Z

If the connection between scheduler and client is lost (e.g. if the scheduler dies) this triggers a reconnect loop on the client to reestablish the connection. If the scheduler is still alive, users will not notice this failure except they are working with previously created Futures. Those futures are cancelled automatically as soon as the client is initiating a reconnect (see here).

If that Future is used the next time, this raises a CancelledError(<key>) without further context and it is frequently unclear for users what this exactly means.

Instead, the user should receive an informative message telling them to check on their scheduler.

@gen_cluster(client=True)
async def test_client_scheduler_lost_sane_exception(c, s, a, b):
    fut = c.submit(inc, 1)
    await wait(fut)

    await s.close()

    with pytest.raises(CancelledError, match='connection to scheduler'):
        await fut

This issue is particularly troublesome if the user is not working with futures directly but the futures are embedded in a persisted collection which renders the entire collection unusable.

The text was updated successfully, but these errors were encountered:

fjetter · 2024-06-12T11:49:28Z

A rather straightforward way to improve this is to allow the Future.cancel method that is being invoked in the reconnect method to accept an exception or message that is then properly forwarded and raised.

github-actions bot added the needs triage label Jun 12, 2024

fjetter added enhancement Improve existing functionality or make things work better diagnostics stability Issue or feature related to cluster stability (e.g. deadlock) scheduler and removed needs triage labels Jun 12, 2024

hendrikmakait self-assigned this Jun 18, 2024

hendrikmakait mentioned this issue Jun 19, 2024

Improve error on cancelled tasks due to disconnect #8705

Merged

2 tasks

hendrikmakait closed this as completed in #8705 Jun 24, 2024

fjetter mentioned this issue Jul 16, 2024

Unclear user feedback in case Client-Scheduler connection breaks up #5666

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Better exception if scheduler disconnects from client #8690

Better exception if scheduler disconnects from client #8690

fjetter commented Jun 12, 2024

fjetter commented Jun 12, 2024

Better exception if scheduler disconnects from client #8690

Better exception if scheduler disconnects from client #8690

Comments

fjetter commented Jun 12, 2024

fjetter commented Jun 12, 2024