Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Helm] Add RBAC support for KubeCluster (dask-kubernetes) #353

Closed
ddelange opened this issue Mar 22, 2022 · 6 comments
Closed

[Helm] Add RBAC support for KubeCluster (dask-kubernetes) #353

ddelange opened this issue Mar 22, 2022 · 6 comments
Labels
enhancement New feature or request

Comments

@ddelange
Copy link
Contributor

Current behavior

Hi 👋 I could not find references to RBAC in the server helm chart, so (please correct me if I'm wrong) I assume there is currently no configuration for DaskKubernetesEnvironment RBAC when using the server Helm chart.

Proposed behavior

I would like to be able to use DaskKubernetesEnvironment out of the box using the server Helm chart (or have some docs how to enable it).

Example

This way a k8s user can get started on Prefect straight away using ephemeral autoscaling Dask clusters under the hood which are completely managed by Prefect.

@ddelange ddelange added the enhancement New feature or request label Mar 22, 2022
@ddelange
Copy link
Contributor Author

I think I found the docs I was looking for. Environments are deprecated and I missed that obvious warning on the top of the document.

If RBAC configuration is no longer needed for a KubeCluster in the new setup, please close this issue!

@ddelange ddelange changed the title [Helm] Add support for DaskKubernetesEnvironment [Helm] Add RBAC support for KubeCluster (dask-kubernetes) Mar 22, 2022
@ddelange
Copy link
Contributor Author

I bashed into one of the prefect pods spawned by the helm chart and tried it out. Looks indeed like a lack of RBAC:

>>> from dask_kubernetes import KubeCluster, make_pod_spec
>>> with KubeCluster(make_pod_spec(image="prefecthq/prefect")) as cluster:
...     print(cluster)
...
Creating scheduler pod on cluster. This may take some time.
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.7/site-packages/dask_kubernetes/core.py", line 474, in __init__
    super().__init__(**self.kwargs)
  File "/usr/local/lib/python3.7/site-packages/distributed/deploy/spec.py", line 260, in __init__
    self.sync(self._start)
  File "/usr/local/lib/python3.7/site-packages/distributed/utils.py", line 311, in sync
    self.loop, func, *args, callback_timeout=callback_timeout, **kwargs
  File "/usr/local/lib/python3.7/site-packages/distributed/utils.py", line 364, in sync
    raise exc.with_traceback(tb)
  File "/usr/local/lib/python3.7/site-packages/distributed/utils.py", line 349, in f
    result[0] = yield future
  File "/usr/local/lib/python3.7/site-packages/tornado/gen.py", line 762, in run
    value = future.result()
  File "/usr/local/lib/python3.7/site-packages/dask_kubernetes/core.py", line 603, in _start
    await super()._start()
  File "/usr/local/lib/python3.7/site-packages/distributed/deploy/spec.py", line 289, in _start
    self.scheduler = await self.scheduler
  File "/usr/local/lib/python3.7/site-packages/distributed/deploy/spec.py", line 59, in _
    await self.start()
  File "/usr/local/lib/python3.7/site-packages/dask_kubernetes/core.py", line 182, in start
    await super().start(**kwargs)
  File "/usr/local/lib/python3.7/site-packages/dask_kubernetes/core.py", line 87, in start
    raise e
  File "/usr/local/lib/python3.7/site-packages/dask_kubernetes/core.py", line 78, in start
    self.namespace, self.pod_template
  File "/usr/local/lib/python3.7/site-packages/kubernetes_asyncio/client/api_client.py", line 192, in __call_api
    raise e
  File "/usr/local/lib/python3.7/site-packages/kubernetes_asyncio/client/api_client.py", line 189, in __call_api
    _request_timeout=_request_timeout)
  File "/usr/local/lib/python3.7/site-packages/kubernetes_asyncio/client/rest.py", line 229, in POST
    body=body))
  File "/usr/local/lib/python3.7/site-packages/kubernetes_asyncio/client/rest.py", line 180, in request
    raise ApiException(http_resp=r)
kubernetes_asyncio.client.exceptions.ApiException: (401)
Reason: Unauthorized
HTTP response headers: <CIMultiDictProxy('Audit-Id': '8bfdb7da-9d1f-4e4d-b112-9c60bb54d4f2', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json', 'Date': 'Tue, 22 Mar 2022 15:14:11 GMT', 'Content-Length': '129')>
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"Unauthorized","reason":"Unauthorized","code":401}

@zanieb
Copy link
Contributor

zanieb commented Mar 22, 2022

Hi! You'll need to follow the documentation in https://kubernetes.dask.org/en/latest/kubecluster.html#role-based-access-control-rbac

I think creating those service accounts is out of scope for the Prefect Server Helm chart. It's not intended to deploy Dask into your cluster, just the Prefect API. It looks like dask-kubernetes has its own chart for managing its required objects https://kubernetes.dask.org/en/latest/helmcluster.html

@ddelange
Copy link
Contributor Author

ddelange commented Mar 22, 2022

Thanks for the quick reply!

I think creating those service accounts is out of scope for the Prefect Server Helm chart.

Fair enough, feel free to close this issue in that case! I will manually push ServiceAccount, Role and RoleBinding manifests based on your link (for future readers, here's another helm example).

It looks like dask-kubernetes has its own chart for managing its required objects https://kubernetes.dask.org/en/latest/helmcluster.html

There are some differences between their KubeCluster and HelmCluster managers ref
dask/dask-kubernetes#277 (comment)

For autoscaling support, the former is needed (and suggested by prefect docs).

@zanieb
Copy link
Contributor

zanieb commented Mar 22, 2022

I see.

Looking at this a bit further, we have all of the permissions listed in our agent documentation on the agent pod that we create. Did you exec into the agent pod or another one when testing KubeCluster? I believe we only attach the RBAC to the agent deployment. https://github.com/PrefectHQ/server/blob/master/helm/prefect-server/templates/agent/rbac.yaml

What SA/Role/RoleBindings are you missing?

@ddelange
Copy link
Contributor Author

ddelange commented Mar 22, 2022

Closing as duplicate of #5573

I was in towel, but after bumping agent memory limit so that I could pip install dask_kubernetes in the pod, the traceback was identical also in the agent pod. After seeing the issue above, I managed to get the code snippet to work.

What SA/Role/RoleBindings are you missing?

I agree the Role looks good, it contains everything that dask_kubernetes expects.

Thanks for the help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants