-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DaskKubernetesEnvironment not setting imagePullSecrets #3356
Comments
@mgsnuno I think what might be happening here is the first prefect job (which deserializes / loads the environment) is not able to start up because it also does not have the image pull secrets. The Kubernetes agent loads the image pull secrets for its pods off of an environment variable which you can set like this:
|
@joshmeek exactly, it's that first prefect job that is not getting the image pull secrets. I did what you suggested but still fails because as you can see in the yaml submitted by prefect bellow (k8s dashboard->Pods->Actions->Edit) there is still not
|
Hmm there could be some confusion here, I actually think setting that export IMAGE_PULL_SECRETS=regcred
prefect agent start kubernetes |
@joshmeek that works for my local running prefect kubernetes agent, the job pod creates successfully and then fails later on with something related to rbac permissions I believe (see bellow). This is the local running kubernetes agent command:
Since having the kubernetes agent installed in kubernetes is a way more elegant solution and can also solve the rbac issue with the
Job pod gets created successfully, pulling image from private repo works, but then I still get the following:
Any ideas on how to proceed? |
@mgsnuno Haven't seen that one before! Looks like some weird pod <--> job mismatch that could be due to formatting. Could you try taking out |
I think I got something, since yesterday I was experimenting with a flow.environment = DaskKubernetesEnvironment(
min_workers=1,
max_workers=1,
scheduler_spec_file=os.path.join(dirname, "pod_scheduler.yaml"),
worker_spec_file=os.path.join(dirname, "pod_worker.yaml"),
image_pull_secret="regcred",
) where `pod_scheduler.yaml`kind: Pod
metadata:
labels:
app: dask
spec:
template:
metadata:
labels:
app: prefect-dask-scheduler
spec:
replicas: 1
restartPolicy: Always
containers:
- image: <private-repo-name>.azurecr.io/pipelines-dask:latest
imagePullPolicy: Always
name: dask-scheduler
args: [dask-scheduler]
resources:
limits:
cpu: "3"
memory: 12G
requests:
cpu: "3"
memory: 12G
imagePullSecrets:
- name: regcred where `pod_worker.yaml`kind: Pod
metadata:
labels:
app: dask
spec:
replicas: 1
restartPolicy: Always
containers:
- image: <private-repo-name>.azurecr.io/pipelines-dask:latest
imagePullPolicy: Always
name: dask-worker
args:
[
dask-worker,
--nthreads,
"3",
--memory-limit,
12G,
--death-timeout,
"60",
]
resources:
limits:
cpu: "3"
memory: 12G
requests:
cpu: "3"
memory: 12G
imagePullSecrets:
- name: regcred This was causing the errors I just sent. I based these yamls on what I saw in https://docs.prefect.io/orchestration/execution/dask_k8s_environment.html#examples I removed those, so I ran flow.environment = DaskKubernetesEnvironment(
min_workers=1,
max_workers=1,
image_pull_secret="regcred",
) Initially it fails with So I copied the default flow.environment = DaskKubernetesEnvironment(
min_workers=1,
max_workers=1,
scheduler_spec_file=os.path.join(dirname, "job.yaml"),
worker_spec_file=os.path.join(dirname, "worker_pod.yaml"),
image_pull_secret="regcred",
) And it worked! So, any pointers why the custom yaml do not work? Thank you |
Comparing apiVersion: batch/v1
kind: Job
....
restartPolicy: Never And now it goes further until it stops again with the job Since I'm writing the yaml files myself, I can place |
Ah that makes sense as to why it would fail, the scheduler spec is expected to be a job, not a pod. Basing your scheduler spec off the job.yaml in the repo is a good idea. The image pull secrets set via env var is not forwarded to custom specs and that is intentional by design. You'll have to add the |
ok, thanks a lot. I also found that in order for it to work the custom scheduler args/command have to be command: ["/bin/sh", "-c"]
args:
[
'python -c "import prefect; prefect.environments.execution.load_and_run_flow()"',
] I was trying which brings me to this pod_scheduler.yamlapiVersion: batch/v1
kind: Job
metadata:
name: prefect-dask-scheduler
labels:
app: dask
spec:
template:
metadata:
labels:
app: prefect-dask-scheduler
spec:
replicas: 1
restartPolicy: Never
containers:
- image: <private_repo_name>.azurecr.io/pipelines-dask:latest
imagePullPolicy: Always
name: dask-scheduler
command: ["/bin/sh", "-c"]
args:
[
'python -c "import prefect; prefect.environments.execution.load_and_run_flow()"',
]
resources:
limits:
cpu: "3"
memory: 12G
requests:
cpu: "3"
memory: 12G
imagePullSecrets:
- name: regcred Another question: how can I expose the scheduler address in order to have access to the dask dashboard? When spawning the cluster myself using dask-kubernetes I was setting |
@joshmeek it would be great if you could help with the following:
|
@mgsnuno I haven't attempted that before using this environment so I can't say for certain how to do it. Looking at that config you set I wonder if you can set the env var like this to set it on your scheduler pod:
|
@joshmeek I tried the following: env:
- name: DASK_KUBERNETES__SCHEDULER_SERVICE_TYPE
value: LoadBalancer
- name: DASK__DISTRIBUTED__COMM__TIMEOUTS__CONNECT
value: "200"
- name: DASK__KUBERNETES__DEPLOY_MODE
value: remote Didn't work because no LoadBalancer service gets created. See dask/dask-kubernetes#259 (comment) for reference. I can open a separate enhancement issue for this, to expose the dask dashboard. |
Working solution:
prefect agent install kubernetes \
--api http://<remote_server_url_OR_localhost>:4200 \
--rbac \
--image-pull-secrets=regcred \
--namespace pipelines \
kubectl apply --namespace=pipelines -f -
flow.environment = DaskKubernetesEnvironment(
min_workers=1,
max_workers=1,
scheduler_spec_file="pod_scheduler.yaml",
worker_spec_file="pod_worker.yaml",
image_pull_secret="regcred",
)
pod_scheduler.yamlapiVersion: batch/v1
kind: Job
metadata:
name: prefect-dask-scheduler
labels:
app: prefect
spec:
template:
metadata:
labels:
app: prefect-dask-scheduler
spec:
restartPolicy: Never
containers:
- name: prefect-dask-scheduler
imagePullPolicy: Always
command: ["/bin/sh", "-c"]
args:
[
'python -c "import prefect; prefect.environments.execution.load_and_run_flow()"',
]
resources:
limits:
cpu: "3"
memory: 12G
requests:
cpu: "3"
memory: 12G
imagePullSecrets:
- name: regcred pod_worker.yamlkind: Pod
metadata:
labels:
app: prefect
spec:
restartPolicy: Never
containers:
- name: prefect-dask-worker
imagePullPolicy: Always
args:
[
dask-worker,
--nthreads,
"3",
--memory-limit,
12G,
--death-timeout,
"60",
]
resources:
limits:
cpu: "3"
memory: 12G
requests:
cpu: "3"
memory: 12G
imagePullSecrets:
- name: regcred |
Description
When using
DaskKubernetesEnvironment
with a Docker storage in a private registry in azure, the deployment of the prefect job/flow run container fails with:Looking into the yaml of the job submitted to the cluster (Pods->Actions->Edit) it is clear that there is no field as bellow:
I've created an
image_pull_secret
and tried setting it in the custom scheduler/worker yaml files and also as an argument inDaskKubernetesEnvironment
, both fail.Expected Behavior
imagePullSecrets
should be part of the yaml file submitted by prefect to run the flow.Reproduction
Environment
The text was updated successfully, but these errors were encountered: