Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add airflow.kubernetesPodTemplate.lifecycle value #653

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

iSWATxJOKERi
Copy link

@iSWATxJOKERi iSWATxJOKERi commented Oct 5, 2022

What issues does your PR fix?
Sometimes it is necessary to perform a certain action/do something when a container has started. Other's have also had this problem: #463

In my team's case for example, after we've read secrets from Vault and mounted them in a directory, let's say, usr/local/share/ca-certificates, we want to add these certificates files to the trust store in etc/ssl/certs/ca-certificates.crt by running update-ca-certificates.

The PR allows you to define the container lifecycle hook in the kubernetesPodTemplate, something that was previously only possible by adding in the whole of the pod template via the airflow.kubernetesPodTemplate.stringOverride variable which can be very difficult to do (didn't successfully do it) because it's hard to tell exactly what needs to go into the full pod template of the stringOverride key in order to replicate what you get when you look at the output of a pod spec as-is with -o yaml.

What does your PR do?
Adds the airflow.kubernetesPodTemplate.lifecycle value to set container lifecycle hooks on the KubernetesExecutor pod_template. Other's have also had this problem

Checklist

For all Pull Requests

For releasing ONLY

@thesuperzapper
Copy link
Member

@iSWATxJOKERi Thanks for the PR, however, I am still not sold that there is a legitimate use case that needs us to expose the lifecycle configs to the user. For example, in your ca-certificates use case, a much safer (and already available option) is to use extraInitContainers, and possibly also shareProcessNamespace.

Furthermore, using lifecycle hooks might result in unexpected behavior, specifically:

While I can sort-of imagine (very niche) use cases for postStart hooks, if we choose to expose their functionality to the user, we must put strong warnings about possible unexpected behavior, and recommend using init-containers in any use case that needs something to run BEFORE the pod starts.


If we were to implement a feature like this, here are my initial thoughts on your PR

  • You have only added this ability to the airflow.kubernetesPodTemplate pods, we should add values for all the deployments/pods and probably also have a "global" one under an airflow.xxxx value
  • Technically the correct link for the "spec" of Lifecycle v1 core is https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.20/#lifecycle-v1-core, so that's a better link for the docstring.

I also want to note that we currently use the preStop hook in the workers.celery.gracefullTermination feature for worker pods, so allowing users to set their own lifecycle would not be possible alongside this feature. Given this, I think it's probably best to not expose the full lifecycle spec, but rather create more focused xxxx.postStartScript and xxxx.preStopScript values (that take a string), so that we can do whatever we want in the chart and then run the users extra script (if they have provided one).

We can create these values for all of our deployments, but also make one called airflow.postStartScript and airflow.preStopScript that will set the script on all our airflow deployments/pods. (NOTE: I am not sure if airflow.preStopScript should override workers.preStopScript, or be run in addition to it)

For situations when we have our own commands that we want to run (like in the workers.celery.gracefullTermination feature), we would run whatever our chart-provided command was, and then run the user's one AFTER.

An example of what a workers.preStopScript value might look like:

workers:
   preStopScript: |
     #!/usr/bin/env bash
     echo "Hello world!"

@iSWATxJOKERi
Copy link
Author

Hi @thesuperzapper , thanks for the quick response and thorough feedback! Learning alot from the things you mentioned that I didn't even think about!

For example, in your ca-certificates use case, a much safer (and already available option) is to use extraInitContainers

Should've been more detailed in the reason for the MR, but yeah we tried using an initContainer to run the update-ca-certificates command after the vault-agent-init container has mounted the certs into the shared volume mount at /usr/local/share/ca-certificates but that didn't work and we even tried running it inside of a shared volume mount itself thinking that'd also work, but it just runs it in the init container's filesystem not the base containers one. or the other containers that share the mount.

While I can sort-of imagine (very niche) use cases for postStart hooks, if we choose to expose their functionality to the user, we must put strong warnings about possible unexpected behavior

Good point, I agree

I also want to note that we currently use the preStop hook in the workers.celery.gracefullTermination feature for worker pods, so allowing users to set their own lifecycle would not be possible alongside this feature.

Another good point, didn't know lifecycle was used somewhere in the chart...So yeah I think like you said, the two additional scripts would be the best at solving this problem without having to worry about race conditions and gotchas and the sort.

we would run whatever our chart-provided command was, and then run the user's one AFTER.

Do you have an idea where we would add the airflow.postStartScript and airflow.preStopScript and the values for the other deployments/pods in the templates? and how I can help in pushing this along. I'm willing to help push this through! Thanks again for helping us!

@thesuperzapper
Copy link
Member

@iSWATxJOKERi I think the values would be like this:

airflow:
  # these are like the other "global" configs like `airflow.extraVolumeMounts` 
  # the question is should the more specific values like `scheduler.postStartScript` override OR run in addition to
  postStartScript: ""
  preStopScript: ""

  kubernetesPodTemplate:
      postStartScript: ""
      preStopScript: ""

  dbMigrations:
      postStartScript: ""
      preStopScript: ""

  sync:
      postStartScript: ""
      preStopScript: ""

scheduler:
  postStartScript: ""
  preStopScript: ""

web:
  postStartScript: ""
  preStopScript: ""

workers:
  postStartScript: ""
  preStopScript: ""

triggerer:
  postStartScript: ""
  preStopScript: ""

flower:
  postStartScript: ""
  preStopScript: ""

We should probably mount these scripts as files from a secret (using projected volume), this could get messy if not done with some kind of template (like how we do pod tolerations), as there are so many different possible scripts.

If you need more guidance on making that template, I can point you in the right direction.

@stale
Copy link

stale bot commented Dec 10, 2022

This issue has been automatically marked as stale because it has not had activity in 60 days.
It will be closed in 7 days if no further activity occurs.

Thank you for your contributions.


Issues never become stale if any of the following is true:

  1. they are added to a Project
  2. they are added to a Milestone
  3. they have the lifecycle/frozen label

@stale stale bot added the lifecycle/stale lifecycle - this is stale label Dec 10, 2022
@stale stale bot closed this Dec 17, 2022
@thesuperzapper thesuperzapper added this to the airflow-8.7.0 milestone Dec 21, 2022
@stale stale bot removed the lifecycle/stale lifecycle - this is stale label Dec 21, 2022
@thesuperzapper thesuperzapper changed the title feat: add airflow.kubernetesPodTemplate.lifecycle value feat: add airflow.kubernetesPodTemplate.lifecycle value May 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants