You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Currently dask worker pods are spread onto available nodes by the default kubernetes scheduler:
[ec2-user@ip-192-168-60-131 ~]$ kubectl get pod -o yaml dask-cgentemann-osm2020tutorial-nqchvhmy-6e9099fc-3k2s6c -n binder-staging | grep schedule
schedulerName: default-scheduler
This can lead to scale-down issues with multiple users launching clusters or when pods encounter errors because pods by default spread out on available nodes. For example, we recently observed an issue were many dask pods had an Error status, leading to new nodes being launched to meet capacity. We ended up with 17 nodes running with two dask pods per node instead of packing all pods onto 5 nodes.
I guess the two steps here would be to expose the schedulerName via the configuration and then document how user's should configure things when running Zero2JupyterHub.
Does that sound right? Or is there anything else we should do here?
Currently dask worker pods are spread onto available nodes by the default kubernetes scheduler:
This can lead to scale-down issues with multiple users launching clusters or when pods encounter errors because pods by default spread out on available nodes. For example, we recently observed an issue were many dask pods had an Error status, leading to new nodes being launched to meet capacity. We ended up with 17 nodes running with two dask pods per node instead of packing all pods onto 5 nodes.
JupyterHub deals with this same scenario by packing user-notebook pods onto nodes with a custom
userScheduler
:https://zero-to-jupyterhub.readthedocs.io/en/latest/administrator/optimization.html#using-available-nodes-efficiently-the-user-scheduler
@yuvipanda suggested a possible solution is simply reusing the jupyter scheduler in dask kubernetes config. Some additional relevant docs here:
https://kubernetes.io/docs/tasks/administer-cluster/configure-multiple-schedulers/#specify-schedulers-for-pods
The text was updated successfully, but these errors were encountered: