diff --git a/doc/source/index.rst b/doc/source/index.rst index b82ae439..8b1e5229 100644 --- a/doc/source/index.rst +++ b/doc/source/index.rst @@ -30,8 +30,8 @@ Welcome to the documentation for the Dask Kubernetes Operator. The package ``dask-kubernetes`` provides a Dask operator for Kubernetes. ``dask-kubernetes`` is one of many options for deploying Dask clusters, see `Deploying Dask `_ in the Dask documentation for an overview of additional options. -KubeCluster ------------ +Quickstart +---------- :class:`KubeCluster` deploys Dask clusters on Kubernetes clusters using custom Kubernetes resources. It is designed to dynamically launch ad-hoc deployments. @@ -41,6 +41,111 @@ Kubernetes resources. It is designed to dynamically launch ad-hoc deployments. $ # Install operator CRDs and controller, needs to be done once on your Kubernetes cluster $ helm install --repo https://helm.dask.org --create-namespace -n dask-operator --generate-name dask-kubernetes-operator +.. code-block:: console + + $ # Install dask-kubernetes + $ pip install dask-kubernetes + +What is the operator? +--------------------- + +The Dask Operator is a set of custom resources and a controller that runs on your Kubernetes cluster and allows you to create and manage your Dask clusters as Kubernetes resources. +Creating clusters can either be done via the :doc:`Kubernetes API with kubectl ` or the :doc:`Python API with KubeCluster `. + +To :doc:`install the operator ` you need to apply some custom resource definitions that allow us to describe Dask resources and the operator itself which is a small Python application that +watches the Kubernetes API for events related to our custom resources and creates other resources such as ``Pods`` and ``Services`` accordingly. + +What resources does the operator manage? +--------------------------------------- + +The operator manages a hierarchy of resources, some custom resources to represent Dask primitives like clusters and worker groups, and native Kubernetes resources such as pods and services to run the cluster processes and facilitate communication. + +.. mermaid:: + + graph TD + DaskJob(DaskJob) + DaskCluster(DaskCluster) + DaskAutoscaler(DaskAutoscaler) + SchedulerService(Scheduler Service) + SchedulerPod(Scheduler Pod) + DaskWorkerGroup(DaskWorkerGroup) + WorkerPodA(Worker Pod A) + WorkerPodB(Worker Pod B) + WorkerPodC(Worker Pod C) + JobPod(Job Runner Pod) + + DaskJob --> DaskCluster + DaskJob --> JobPod + DaskCluster --> SchedulerService + DaskCluster --> DaskAutoscaler + SchedulerService --> SchedulerPod + DaskCluster --> DaskWorkerGroup + DaskWorkerGroup --> WorkerPodA + DaskWorkerGroup --> WorkerPodB + DaskWorkerGroup --> WorkerPodC + + classDef dask stroke:#FDA061,stroke-width:4px + classDef dashed stroke-dasharray: 5 5 + class DaskJob dask + class DaskCluster dask + class DaskWorkerGroup dask + class DaskAutoscaler dask + class DaskAutoscaler dashed + class SchedulerService dashed + class SchedulerPod dashed + class WorkerPodA dashed + class WorkerPodB dashed + class WorkerPodC dashed + class JobPod dashed + + +Worker Groups +^^^^^^^^^^^^^ + +A ``DaskWorkerGroup`` represents a homogenous group of workers that can be scaled. The resource is similar to a native Kubernetes ``Deployment`` in that it manages a group of workers +with some intelligence around the ``Pod`` lifecycle. A worker group must be attached to a Dask Cluster resource in order to function. + +All `Kubernetes annotations `__ on the +``DaskWorkerGroup`` resource will be passed onto worker ``Pod`` resources. Annotations created by `kopf` or +`kubectl` (i.e. starting with "kopf.zalando.org" or "kubectl.kubernetes.io") will not be passed on. + + +Clusters +^^^^^^^^ + +The ``DaskCluster`` custom resource creates a Dask cluster by creating a scheduler ``Pod``, scheduler ``Service`` and default ``DaskWorkerGroup`` which in turn creates worker ``Pod`` resources. + +Workers connect to the scheduler via the scheduler ``Service`` and that service can also be exposed to the user in order to connect clients and perform work. + +The operator also has support for creating additional worker groups. These are extra groups of workers with different +configuration settings and can be scaled separately. You can then use `resource annotations `_ +to schedule different tasks to different groups. + +All `Kubernetes annotations ` on the +``DaskCluster`` resource will be passed onto the scheduler ``Pod`` and ``Service`` as well the ``DaskWorkerGroup`` +resources. Annotations created by `kopf` or `kubectl` (i.e. starting with "kopf.zalando.org" or "kubectl.kubernetes.io") +will not be passed on. + +For example you may wish to have a smaller pool of workers that have more memory for memory intensive tasks, or GPUs for compute intensive tasks. + +Jobs +^^^^ + +A ``DaskJob`` is a batch style resource that creates a ``Pod`` to perform some specific task from start to finish alongside a ``DaskCluster`` that can be leveraged to perform the work. + +All `Kubernetes annotations ` on the +``DaskJob`` resource will be passed on to the job-runner ``Pod`` resource. If one also wants to set Kubernetes +annotations on the cluster-related resources (scheduler and worker ``Pods``), these can be set as +``spec.cluster.metadata`` in the ``DaskJob`` resource. Annotations created by `kopf` or `kubectl` (i.e. starting with +"kopf.zalando.org" or "kubectl.kubernetes.io") will not be passed on. + +Once the job ``Pod`` runs to completion the cluster is removed automatically to save resources. This is great for workflows like training a distributed machine learning model with Dask. + +Autoscalers +^^^^^^^^^^^ + +A ``DaskAutoscaler`` resource will communicate with the scheduler periodically and auto scale the default ``DaskWorkerGroup`` to the desired number of workers. + .. code-block:: python from dask_kubernetes.operator import KubeCluster @@ -50,17 +155,16 @@ Kubernetes resources. It is designed to dynamically launch ad-hoc deployments. .. toctree:: :maxdepth: 2 :hidden: - :caption: Installing + :caption: Getting Syarted + Overview installing - operator_installation .. toctree:: :maxdepth: 2 :hidden: :caption: Operator - operator operator_kubecluster operator_resources operator_extending diff --git a/doc/source/installing.rst b/doc/source/installing.rst index d5cd656e..804158ff 100644 --- a/doc/source/installing.rst +++ b/doc/source/installing.rst @@ -1,26 +1,22 @@ -Python Package -============== +Installing +=========== .. currentmodule:: dask_kubernetes -You can install dask-kubernetes with ``pip``, ``conda``, or by installing from source. - -Dependencies ------------- +Python package +-------------- -To use :class:`KubeCluster` you may need to have ``kubectl`` installed (`official install guide `_). - -To use :class:`HelmCluster` you will need to have ``helm`` installed (`official install guide `_). +You can install dask-kubernetes with ``pip``, ``conda``, or by installing from source. Pip ---- +^^^ Pip can be used to install dask-kubernetes and its Python dependencies:: pip install dask-kubernetes --upgrade # Install everything from last released version Conda ------ +^^^^^ To install the latest version of dask-kubernetes from the `conda-forge `_ repository using @@ -29,7 +25,7 @@ To install the latest version of dask-kubernetes from the conda install dask-kubernetes -c conda-forge Install from Source -------------------- +^^^^^^^^^^^^^^^^^^^ To install dask-kubernetes from source, clone the repository from `github `_:: @@ -46,6 +42,204 @@ You can also install directly from git main branch:: pip install git+https://github.com/dask/dask-kubernetes +Operator +-------- + +To use the Dask Operator you must install the custom resource definitions, service account, roles, and the operator controller deployment. + +Quickstart +^^^^^^^^^^ + +.. code-block:: console + + $ helm install --repo https://helm.dask.org --create-namespace -n dask-operator --generate-name dask-kubernetes-operator + +.. figure:: images/operator-install.gif + :align: left + +Installing with Helm +^^^^^^^^^^^^^^^^^^^^ + +The operator has a Helm chart which can be used to manage the installation of the operator. +The chart is published in the `Dask Helm repo `_ repository, and can be installed via: + +.. code-block:: console + + $ helm repo add dask https://helm.dask.org + "dask" has been added to your repositories + + $ helm repo update + Hang tight while we grab the latest from your chart repositories... + ...Successfully got an update from the "dask" chart repository + Update Complete. ⎈Happy Helming!⎈ + + $ helm install --create-namespace -n dask-operator --generate-name dask/dask-kubernetes-operator + NAME: dask-kubernetes-operator-1666875935 + NAMESPACE: dask-operator + STATUS: deployed + REVISION: 1 + TEST SUITE: None + NOTES: + Operator has been installed successfully. + +Then you should be able to list your Dask clusters via ``kubectl``. + +.. code-block:: console + + $ kubectl get daskclusters + No resources found in default namespace. + +We can also check the operator pod is running: + +.. code-block:: console + + $ kubectl get pods -A -l app.kubernetes.io/name=dask-kubernetes-operator + NAMESPACE NAME READY STATUS RESTARTS AGE + dask-operator dask-kubernetes-operator-775b8bbbd5-zdrf7 1/1 Running 0 74s + +.. warning:: + Please note that `Helm does not support updating or deleting CRDs. `_ If updates + are made to the CRD templates in future releases (to support future k8s releases, for example) you may have to manually update the CRDs or delete/reinstall the Dask Operator. + +Single namespace +"""""""""""""""" + +By default the controller is installed with a ``ClusterRole`` and watches all namespaces. +You can also just install it into a single namespace by setting the following options. + +.. code-block:: console + + $ helm install -n my-namespace --generate-name dask/dask-kubernetes-operator --set rbac.cluster=false --set kopfArgs="{--namespace=my-namespace}" + NAME: dask-kubernetes-operator-1749875935 + NAMESPACE: my-namespace + STATUS: deployed + REVISION: 1 + TEST SUITE: None + NOTES: + Operator has been installed successfully. + +Prometheus +"""""""""" + +The operator helm chart also contains some optional `ServiceMonitor` and `PodMonitor` resources to enable Prometheus scraping of Dask components. +As not all clusters have the Prometheus operator installed these are disabled by default. You can enable them with the following comfig options. + +.. code-block:: yaml + + metrics: + scheduler: + enabled: true + serviceMonitor: + enabled: true + worker: + enabled: true + serviceMonitor: + enabled: true + +You'll also need to ensure the container images you choose for your Dask components have the ``prometheus_client`` library installed. +If you're using the official Dask images you can install this at runtime. + +.. code-block:: python + + from dask_kubernetes.operator import KubeCluster + cluster = KubeCluster(name="monitored", env={"EXTRA_PIP_PACKAGES": "prometheus_client"}) + +Chart Configuration Reference +""""""""""""""""""""""""""""" + +.. frigate:: ../../dask_kubernetes/operator/deployment/helm/dask-kubernetes-operator + +Installing with Manifests +^^^^^^^^^^^^^^^^^^^^^^^^^ + +If you prefer to install the operator from static manifests with ``kubectl`` and set configuration options with tools like ``kustomize`` you can generate the default manifests with:: + + $ helm template --include-crds --repo https://helm.dask.org release dask-kubernetes-operator | kubectl apply -f - + + +Kubeflow +^^^^^^^^ + +In order to use the Dask Operator with `Kubeflow `_ you need to perform some extra installation steps. + +User permissions +"""""""""""""""" + +Kubeflow doesn't know anything about our Dask custom resource definitions so we need to update the ``kubeflow-kubernetes-edit`` cluster role. This role +allows users with cluster edit permissions to create pods, jobs and other resources and we need to add the Dask custom resources to that list. Edit the +existing ``clusterrole`` and add a new rule to the ``rules`` section for ``kubernetes.dask.org`` that allows all operations on all custom resources in our API namespace. + +.. code-block:: console + + $ kubectl patch clusterrole kubeflow-kubernetes-edit --type="json" --patch '[{"op": "add", "path": "/rules/-", "value": {"apiGroups": ["kubernetes.dask.org"],"resources": ["*"],"verbs": ["*"]}}]' + clusterrole.rbac.authorization.k8s.io/kubeflow-kubernetes-edit patched + +Dashboard access +"""""""""""""""" + +If you are using the Jupyter Notebook service in KubeFlow there are a couple of extra steps you need to do to be able to access the Dask dashboard. +The dashboard will be running on the scheduler pod and accessible via the scheduler service, so to access that your Jupyter container will need to +have the `jupyter-server-proxy `_ extension installed. If you are using the +`Dask Jupter Lab extension `_ this will be installed automatically for you. + +By default the proxy will only allow proxying other services running on the same host as the Jupyter server, which means you can't access the scheduler +running in another pod. So you need to set some extra config to tell the proxy which hosts to allow. Given that we can already execute arbitrary code +in Jupyter (and therefore interact with other services within the Kubernetes cluster) we can allow all hosts in the proxy settings with +``c.ServerProxy.host_allowlist = lambda app, host: True``. + +The :class:`dask_kubernetes.operator.KubeCluster` and :class:`distributed.Client` objects both have a ``dashboard_link`` attribute that you can +view to find the URL of the dashboard, and this is also used in the widgets shown in Jupyter. The default link will not work on KubeFlow so you need +to change this to ``"{NB_PREFIX}/proxy/{host}:{port}/status"`` to ensure it uses the Jupyter proxy. + +To apply these configuration options to the Jupyter pod you can create a ``PodDefault`` configuration object that can be selected when launching the notebook. Create +a new file with the following contents. + +.. code-block:: yaml + + # configure-dask-dashboard.yaml + apiVersion: "kubeflow.org/v1alpha1" + kind: PodDefault + metadata: + name: configure-dask-dashboard + spec: + selector: + matchLabels: + configure-dask-dashboard: "true" + desc: "configure dask dashboard" + env: + - name: DASK_DISTRIBUTED__DASHBOARD__LINK + value: "{NB_PREFIX}/proxy/{host}:{port}/status" + volumeMounts: + - name: jupyter-server-proxy-config + mountPath: /root/.jupyter/jupyter_server_config.py + subPath: jupyter_server_config.py + volumes: + - name: jupyter-server-proxy-config + configMap: + name: jupyter-server-proxy-config + --- + apiVersion: v1 + kind: ConfigMap + metadata: + name: jupyter-server-proxy-config + data: + jupyter_server_config.py: | + c.ServerProxy.host_allowlist = lambda app, host: True + +Then apply this to your KubeFlow user's namespace with ``kubectl``. For example with the default ``user@example.com`` user +it would be. + +.. code-block:: console + + $ kubectl apply -n kubeflow-user-example-com -f configure-dask-dashboard.yaml + +Then when you launch your Jupyter Notebook server be sure to check the ``configure dask dashboard`` configuration option. + +.. figure:: images/kubeflow-notebooks-configuration-selector.png + :alt: The KubeFlow Notebook Configuration selector showing the "configure dask dashboard" option checked + :align: center + + Supported Versions ------------------ diff --git a/doc/source/operator.rst b/doc/source/operator.rst deleted file mode 100644 index c8099dd2..00000000 --- a/doc/source/operator.rst +++ /dev/null @@ -1,103 +0,0 @@ -Overview -======== -.. currentmodule:: dask_kubernetes.operator - -What is the operator? ---------------------- - -The Dask Operator is a set of custom resources and a controller that runs on your Kubernetes cluster and allows you to create and manage your Dask clusters as Kubernetes resources. -Creating clusters can either be done via the :doc:`Kubernetes API with kubectl ` or the :doc:`Python API with KubeCluster `. - -To :doc:`install the operator ` you need to apply some custom resource definitions that allow us to describe Dask resources and the operator itself which is a small Python application that -watches the Kubernetes API for events related to our custom resources and creates other resources such as ``Pods`` and ``Services`` accordingly. - -What resources does the operator manage? ---------------------------------------- - -The operator manages a hierarchy of resources, some custom resources to represent Dask primitives like clusters and worker groups, and native Kubernetes resources such as pods and services to run the cluster processes and facilitate communication. - -.. mermaid:: - - graph TD - DaskJob(DaskJob) - DaskCluster(DaskCluster) - DaskAutoscaler(DaskAutoscaler) - SchedulerService(Scheduler Service) - SchedulerPod(Scheduler Pod) - DaskWorkerGroup(DaskWorkerGroup) - WorkerPodA(Worker Pod A) - WorkerPodB(Worker Pod B) - WorkerPodC(Worker Pod C) - JobPod(Job Runner Pod) - - DaskJob --> DaskCluster - DaskJob --> JobPod - DaskCluster --> SchedulerService - DaskCluster --> DaskAutoscaler - SchedulerService --> SchedulerPod - DaskCluster --> DaskWorkerGroup - DaskWorkerGroup --> WorkerPodA - DaskWorkerGroup --> WorkerPodB - DaskWorkerGroup --> WorkerPodC - - classDef dask stroke:#FDA061,stroke-width:4px - classDef dashed stroke-dasharray: 5 5 - class DaskJob dask - class DaskCluster dask - class DaskWorkerGroup dask - class DaskAutoscaler dask - class DaskAutoscaler dashed - class SchedulerService dashed - class SchedulerPod dashed - class WorkerPodA dashed - class WorkerPodB dashed - class WorkerPodC dashed - class JobPod dashed - - -Worker Groups -^^^^^^^^^^^^^ - -A ``DaskWorkerGroup`` represents a homogenous group of workers that can be scaled. The resource is similar to a native Kubernetes ``Deployment`` in that it manages a group of workers -with some intelligence around the ``Pod`` lifecycle. A worker group must be attached to a Dask Cluster resource in order to function. - -All `Kubernetes annotations `__ on the -``DaskWorkerGroup`` resource will be passed onto worker ``Pod`` resources. Annotations created by `kopf` or -`kubectl` (i.e. starting with "kopf.zalando.org" or "kubectl.kubernetes.io") will not be passed on. - - -Clusters -^^^^^^^^ - -The ``DaskCluster`` custom resource creates a Dask cluster by creating a scheduler ``Pod``, scheduler ``Service`` and default ``DaskWorkerGroup`` which in turn creates worker ``Pod`` resources. - -Workers connect to the scheduler via the scheduler ``Service`` and that service can also be exposed to the user in order to connect clients and perform work. - -The operator also has support for creating additional worker groups. These are extra groups of workers with different -configuration settings and can be scaled separately. You can then use `resource annotations `_ -to schedule different tasks to different groups. - -All `Kubernetes annotations ` on the -``DaskCluster`` resource will be passed onto the scheduler ``Pod`` and ``Service`` as well the ``DaskWorkerGroup`` -resources. Annotations created by `kopf` or `kubectl` (i.e. starting with "kopf.zalando.org" or "kubectl.kubernetes.io") -will not be passed on. - -For example you may wish to have a smaller pool of workers that have more memory for memory intensive tasks, or GPUs for compute intensive tasks. - -Jobs -^^^^ - -A ``DaskJob`` is a batch style resource that creates a ``Pod`` to perform some specific task from start to finish alongside a ``DaskCluster`` that can be leveraged to perform the work. - -All `Kubernetes annotations ` on the -``DaskJob`` resource will be passed on to the job-runner ``Pod`` resource. If one also wants to set Kubernetes -annotations on the cluster-related resources (scheduler and worker ``Pods``), these can be set as -``spec.cluster.metadata`` in the ``DaskJob`` resource. Annotations created by `kopf` or `kubectl` (i.e. starting with -"kopf.zalando.org" or "kubectl.kubernetes.io") will not be passed on. - -Once the job ``Pod`` runs to completion the cluster is removed automatically to save resources. This is great for workflows like training a distributed machine learning model with Dask. - -Autoscalers -^^^^^^^^^^^ - -A ``DaskAutoscaler`` resource will communicate with the scheduler periodically and auto scale the default ``DaskWorkerGroup`` to the desired number of workers. diff --git a/doc/source/operator_installation.rst b/doc/source/operator_installation.rst deleted file mode 100644 index 23f3c5d7..00000000 --- a/doc/source/operator_installation.rst +++ /dev/null @@ -1,196 +0,0 @@ -Operator -======== - -To use the Dask Operator you must install the custom resource definitions, service account, roles, and the operator controller deployment. - -Quickstart ----------- - -.. code-block:: console - - $ helm install --repo https://helm.dask.org --create-namespace -n dask-operator --generate-name dask-kubernetes-operator - -.. figure:: images/operator-install.gif - :align: left - -Installing with Helm --------------------- - -The operator has a Helm chart which can be used to manage the installation of the operator. -The chart is published in the `Dask Helm repo `_ repository, and can be installed via: - -.. code-block:: console - - $ helm repo add dask https://helm.dask.org - "dask" has been added to your repositories - - $ helm repo update - Hang tight while we grab the latest from your chart repositories... - ...Successfully got an update from the "dask" chart repository - Update Complete. ⎈Happy Helming!⎈ - - $ helm install --create-namespace -n dask-operator --generate-name dask/dask-kubernetes-operator - NAME: dask-kubernetes-operator-1666875935 - NAMESPACE: dask-operator - STATUS: deployed - REVISION: 1 - TEST SUITE: None - NOTES: - Operator has been installed successfully. - -Then you should be able to list your Dask clusters via ``kubectl``. - -.. code-block:: console - - $ kubectl get daskclusters - No resources found in default namespace. - -We can also check the operator pod is running: - -.. code-block:: console - - $ kubectl get pods -A -l app.kubernetes.io/name=dask-kubernetes-operator - NAMESPACE NAME READY STATUS RESTARTS AGE - dask-operator dask-kubernetes-operator-775b8bbbd5-zdrf7 1/1 Running 0 74s - -.. warning:: - Please note that `Helm does not support updating or deleting CRDs. `_ If updates - are made to the CRD templates in future releases (to support future k8s releases, for example) you may have to manually update the CRDs or delete/reinstall the Dask Operator. - -Single namespace -^^^^^^^^^^^^^^^^ - -By default the controller is installed with a ``ClusterRole`` and watches all namespaces. -You can also just install it into a single namespace by setting the following options. - -.. code-block:: console - - $ helm install -n my-namespace --generate-name dask/dask-kubernetes-operator --set rbac.cluster=false --set kopfArgs="{--namespace=my-namespace}" - NAME: dask-kubernetes-operator-1749875935 - NAMESPACE: my-namespace - STATUS: deployed - REVISION: 1 - TEST SUITE: None - NOTES: - Operator has been installed successfully. - -Prometheus -^^^^^^^^^^ - -The operator helm chart also contains some optional `ServiceMonitor` and `PodMonitor` resources to enable Prometheus scraping of Dask components. -As not all clusters have the Prometheus operator installed these are disabled by default. You can enable them with the following comfig options. - -.. code-block:: yaml - - metrics: - scheduler: - enabled: true - serviceMonitor: - enabled: true - worker: - enabled: true - serviceMonitor: - enabled: true - -You'll also need to ensure the container images you choose for your Dask components have the ``prometheus_client`` library installed. -If you're using the official Dask images you can install this at runtime. - -.. code-block:: python - - from dask_kubernetes.operator import KubeCluster - cluster = KubeCluster(name="monitored", env={"EXTRA_PIP_PACKAGES": "prometheus_client"}) - -Chart Configuration Reference -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -.. frigate:: ../../dask_kubernetes/operator/deployment/helm/dask-kubernetes-operator - -Installing with Manifests -------------------------- - -If you prefer to install the operator from static manifests with ``kubectl`` and set configuration options with tools like ``kustomize`` you can generate the default manifests with:: - - $ helm template --include-crds --repo https://helm.dask.org release dask-kubernetes-operator | kubectl apply -f - - - -Kubeflow --------- - -In order to use the Dask Operator with `Kubeflow `_ you need to perform some extra installation steps. - -User permissions -^^^^^^^^^^^^^^^^ - -Kubeflow doesn't know anything about our Dask custom resource definitions so we need to update the ``kubeflow-kubernetes-edit`` cluster role. This role -allows users with cluster edit permissions to create pods, jobs and other resources and we need to add the Dask custom resources to that list. Edit the -existing ``clusterrole`` and add a new rule to the ``rules`` section for ``kubernetes.dask.org`` that allows all operations on all custom resources in our API namespace. - -.. code-block:: console - - $ kubectl patch clusterrole kubeflow-kubernetes-edit --type="json" --patch '[{"op": "add", "path": "/rules/-", "value": {"apiGroups": ["kubernetes.dask.org"],"resources": ["*"],"verbs": ["*"]}}]' - clusterrole.rbac.authorization.k8s.io/kubeflow-kubernetes-edit patched - -Dashboard access -^^^^^^^^^^^^^^^^ - -If you are using the Jupyter Notebook service in KubeFlow there are a couple of extra steps you need to do to be able to access the Dask dashboard. -The dashboard will be running on the scheduler pod and accessible via the scheduler service, so to access that your Jupyter container will need to -have the `jupyter-server-proxy `_ extension installed. If you are using the -`Dask Jupter Lab extension `_ this will be installed automatically for you. - -By default the proxy will only allow proxying other services running on the same host as the Jupyter server, which means you can't access the scheduler -running in another pod. So you need to set some extra config to tell the proxy which hosts to allow. Given that we can already execute arbitrary code -in Jupyter (and therefore interact with other services within the Kubernetes cluster) we can allow all hosts in the proxy settings with -``c.ServerProxy.host_allowlist = lambda app, host: True``. - -The :class:`dask_kubernetes.operator.KubeCluster` and :class:`distributed.Client` objects both have a ``dashboard_link`` attribute that you can -view to find the URL of the dashboard, and this is also used in the widgets shown in Jupyter. The default link will not work on KubeFlow so you need -to change this to ``"{NB_PREFIX}/proxy/{host}:{port}/status"`` to ensure it uses the Jupyter proxy. - -To apply these configuration options to the Jupyter pod you can create a ``PodDefault`` configuration object that can be selected when launching the notebook. Create -a new file with the following contents. - -.. code-block:: yaml - - # configure-dask-dashboard.yaml - apiVersion: "kubeflow.org/v1alpha1" - kind: PodDefault - metadata: - name: configure-dask-dashboard - spec: - selector: - matchLabels: - configure-dask-dashboard: "true" - desc: "configure dask dashboard" - env: - - name: DASK_DISTRIBUTED__DASHBOARD__LINK - value: "{NB_PREFIX}/proxy/{host}:{port}/status" - volumeMounts: - - name: jupyter-server-proxy-config - mountPath: /root/.jupyter/jupyter_server_config.py - subPath: jupyter_server_config.py - volumes: - - name: jupyter-server-proxy-config - configMap: - name: jupyter-server-proxy-config - --- - apiVersion: v1 - kind: ConfigMap - metadata: - name: jupyter-server-proxy-config - data: - jupyter_server_config.py: | - c.ServerProxy.host_allowlist = lambda app, host: True - -Then apply this to your KubeFlow user's namespace with ``kubectl``. For example with the default ``user@example.com`` user -it would be. - -.. code-block:: console - - $ kubectl apply -n kubeflow-user-example-com -f configure-dask-dashboard.yaml - -Then when you launch your Jupyter Notebook server be sure to check the ``configure dask dashboard`` configuration option. - -.. figure:: images/kubeflow-notebooks-configuration-selector.png - :alt: The KubeFlow Notebook Configuration selector showing the "configure dask dashboard" option checked - :align: center