Support for pod schedulers other than `schedulerName: default-scheduler` #233

scottyhq · 2020-02-13T22:19:44Z

Currently dask worker pods are spread onto available nodes by the default kubernetes scheduler:

[ec2-user@ip-192-168-60-131 ~]$ kubectl get pod -o yaml dask-cgentemann-osm2020tutorial-nqchvhmy-6e9099fc-3k2s6c -n binder-staging | grep schedule
  schedulerName: default-scheduler

This can lead to scale-down issues with multiple users launching clusters or when pods encounter errors because pods by default spread out on available nodes. For example, we recently observed an issue were many dask pods had an Error status, leading to new nodes being launched to meet capacity. We ended up with 17 nodes running with two dask pods per node instead of packing all pods onto 5 nodes.

JupyterHub deals with this same scenario by packing user-notebook pods onto nodes with a custom userScheduler:
https://zero-to-jupyterhub.readthedocs.io/en/latest/administrator/optimization.html#using-available-nodes-efficiently-the-user-scheduler

@yuvipanda suggested a possible solution is simply reusing the jupyter scheduler in dask kubernetes config. Some additional relevant docs here:
https://kubernetes.io/docs/tasks/administer-cluster/configure-multiple-schedulers/#specify-schedulers-for-pods

The text was updated successfully, but these errors were encountered:

jacobtomlinson · 2020-02-14T09:27:42Z

This sounds reasonable!

I guess the two steps here would be to expose the schedulerName via the configuration and then document how user's should configure things when running Zero2JupyterHub.

Does that sound right? Or is there anything else we should do here?

jacobtomlinson · 2024-04-30T15:29:41Z

The classic KubeCluster was removed in #890. All users will need to migrate to the Dask Operator. Closing.

jacobtomlinson added the enhancement label Feb 14, 2020

scottyhq mentioned this issue Mar 19, 2020

Fix tolerations on gateway dask worker pods pangeo-data/pangeo-cloud-federation#567

Merged

scottyhq mentioned this issue Apr 27, 2020

Poor UX when waiting for scheduler node pool to scale up pangeo-data/pangeo-cloud-federation#587

Open

scottyhq mentioned this issue Nov 4, 2020

Scheduling dask worker pods pangeo-data/pangeo-cloud-federation#273

Closed

jacobtomlinson added kubecluster (classic) help wanted labels Jan 11, 2022

jacobtomlinson closed this as not planned Won't fix, can't repro, duplicate, stale Apr 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support for pod schedulers other than `schedulerName: default-scheduler` #233

Support for pod schedulers other than `schedulerName: default-scheduler` #233

scottyhq commented Feb 13, 2020 •

edited

Loading

jacobtomlinson commented Feb 14, 2020

jacobtomlinson commented Apr 30, 2024

Support for pod schedulers other than schedulerName: default-scheduler #233

Support for pod schedulers other than schedulerName: default-scheduler #233

Comments

scottyhq commented Feb 13, 2020 • edited Loading

jacobtomlinson commented Feb 14, 2020

jacobtomlinson commented Apr 30, 2024

Support for pod schedulers other than `schedulerName: default-scheduler` #233

Support for pod schedulers other than `schedulerName: default-scheduler` #233

scottyhq commented Feb 13, 2020 •

edited

Loading