Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Controller loses API connection after token expiry on Azure Kubernetes Service (AKS) 1.30 due to kopf bug #913

Open
creste opened this issue Oct 24, 2024 · 1 comment
Labels

Comments

@creste
Copy link
Contributor

creste commented Oct 24, 2024

dask-operator fails to create Dask Jobs on Azure Kubernetes Service (AKS) 1.30:

See this kopf bug report for details.

Minimal Complete Verifiable Example:

  1. Install dask-operator on AKS 1.30.
  2. Wait an hour for the authentication token to expire.
  3. Create a DaskJob resource.

dask-operator will not create the DaskJob because dask-operator's kubernetes authentication token has expired and kopf's watchers are no longer connected to kubeapi. A bug in kopf prevents kopf from refreshing the authentication token.

This only occurs on AKS 1.30+ because that is the first AKS version that sets --service-account-extend-token-expiration to false.

Environment:

  • Dask operator version: 2024.9.0
@jacobtomlinson
Copy link
Member

Thanks for flagging this here. I don't see any immediate solution we can implement in dask-kubernetes to work around this so I expect we will need to wait for a fix in nolar/kopf#980

@jacobtomlinson jacobtomlinson changed the title dask-operator fails to create Dask Jobs on Azure Kubernetes Service (AKS) 1.30 Controller loses API connection after token expiry on Azure Kubernetes Service (AKS) 1.30 due to kopf bug Oct 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants