Runner pod for DaskJob fails to spawn #829

creste · 2023-10-05T20:38:52Z

Describe the issue:

The runner pod for DaskJobs fails to spawn when a DaskJob is deleted and then re-created again quickly.

Minimal Complete Verifiable Example:

Create a DaskJob using the example yaml from the Dask documentation.

kubectl apply -f daskjob.yaml

Wait for the runner pod to start.

$ kubectl get all
NAME                                                             READY   STATUS      RESTARTS   AGE
pod/test-simple-job-default-worker-8911716d53-7f8dc4897-tlqm2    1/1     Running     0          5s
pod/test-simple-job-default-worker-ae18a247f6-64d8f6d6d7-xlf4m   1/1     Running     0          5s
pod/test-simple-job-runner                                       1/1     Running     0          6s
pod/test-simple-job-scheduler-7bc7cfb9b7-jlbb6                   0/1     Running     0          5s

Delete the DaskJob.

kubectl delete -f daskjob.yaml

Quickly re-create the DaskJob again.

kubectl apply -f daskjob.yaml

Anything else we need to know?:

This doesn't affect the scheduler or worker pods because they have a unique suffix appended to their names. The runner pod does not. See this code that generates the runner pod's name:

dask-kubernetes/dask_kubernetes/operator/controller/controller.py

Line 171 in c783909

return f"{job_name}-runner"

Environment:

Dask operator version: 2023.9.0

The text was updated successfully, but these errors were encountered:

jacobtomlinson · 2023-10-06T15:46:21Z

I think this will be solved by #695 (comment)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Runner pod for DaskJob fails to spawn #829

Runner pod for DaskJob fails to spawn #829

creste commented Oct 5, 2023

jacobtomlinson commented Oct 6, 2023

Runner pod for DaskJob fails to spawn #829

Runner pod for DaskJob fails to spawn #829

Comments

creste commented Oct 5, 2023

jacobtomlinson commented Oct 6, 2023