Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KubeCluster looking for scheduler at wrong port when using NodePort service #550

Closed
radioflyer28 opened this issue Aug 25, 2022 · 3 comments

Comments

@radioflyer28
Copy link

What happened:
KubeCluster times out when creating a cluster with NodePort service because it's looking for scheduler at 8786 when port is actually a randomized port (e.g. 32367). Note, the scheduler pod does start correctly.

Python test client error:

OSError: Timed out during handshake while connecting to tcp://10.20.0.241:8786 after 10 s

NodePort service in Rancher:
image

What you expected to happen:
For KubeCluster to use the randomized port to contact the scheduler pod instead of port 8786.

Minimal Complete Verifiable Example:

test_kubecluster.py:

import dask
from dask_kubernetes import KubeCluster, KubeConfig

auth = KubeConfig(config_file="~/.kube/remote")
dask.config.set({"kubernetes.scheduler-service-type": "NodePort"})

cluster = KubeCluster('worker-spec.yml', auth=auth, deploy_mode='remote')

worker-spec.yml:

# worker-spec.yml

kind: Pod
metadata:
  labels:
    foo: bar
spec:
  restartPolicy: Never
  containers:
  - image: ghcr.io/dask/dask:latest
    imagePullPolicy: IfNotPresent
    args: [dask-worker, $(DASK_SCHEDULER_ADDRESS), --nthreads, '2', --no-dashboard, --memory-limit, 4GB, --death-timeout, '60']
    name: dask-worker
    env:
      - name: EXTRA_PIP_PACKAGES
        value: git+https://github.com/dask/distributed
    resources:
      limits:
        cpu: "2"
        memory: 4G
      requests:
        cpu: "2"
        memory: 4G

Anything else we need to know?:

Environment:

  • Dask version: 2021.3.0=pyhd8ed1ab_0
  • Dask core: 2021.3.0=pyhd8ed1ab_0
  • Dask kubernetes: 2022.7.0=pyhd8ed1ab_0
  • Python version: 3.8.8=h7840368_0_cpython
  • Operating System: Windows 10
  • Install method (conda, pip, source): conda-forge
Cluster Dump State:
@radioflyer28
Copy link
Author

@jacobtomlinson
Copy link
Member

I think this line is the cuprit

host = nodes.items[0].status.addresses[0].address

@jacobtomlinson
Copy link
Member

The classic KubeCluster was removed in #890. All users will need to migrate to the Dask Operator. Closing.

@jacobtomlinson jacobtomlinson closed this as not planned Won't fix, can't repro, duplicate, stale Apr 30, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants