diff --git a/charts/airflow/CHANGELOG.md b/charts/airflow/CHANGELOG.md index e77e56cb..ddf1dbd9 100644 --- a/charts/airflow/CHANGELOG.md +++ b/charts/airflow/CHANGELOG.md @@ -105,11 +105,11 @@ TBD > 🟨 __NOTES__ 🟨 > -> - You can now use Secrets and ConfigMaps to define your `airflow.{users,connections,pools,variables}`, see the docs: -> - [How to create airflow users?](https://github.com/airflow-helm/charts/tree/main/charts/airflow#how-to-create-airflow-users) -> - [How to create airflow connections?](https://github.com/airflow-helm/charts/tree/main/charts/airflow#how-to-create-airflow-connections) -> - [How to create airflow variables?](https://github.com/airflow-helm/charts/tree/main/charts/airflow#how-to-create-airflow-variables) -> - [How to create airflow pools?](https://github.com/airflow-helm/charts/tree/main/charts/airflow#how-to-create-airflow-pools) +> - You may now use Secrets and ConfigMaps to define your `airflow.{users,connections,pools,variables}`: +> - [How to manage airflow users?](docs/faq/security/airflow-users.md) +> - [How to manage airflow connections?](docs/faq/dags/airflow-connections.md) +> - [How to manage airflow variables?](docs/faq/dags/airflow-variables.md) +> - [How to manage airflow pools?](docs/faq/dags/airflow-pools.md) ### Added - allow referencing Secrets/ConfigMaps in `airflow.{users,connections,pools,variables}` ([#281](https://github.com/airflow-helm/charts/pull/281)) @@ -262,7 +262,7 @@ TBD - native support for [Airflow 2.0's HA scheduler](https://airflow.apache.org/docs/apache-airflow/stable/scheduler.html#running-more-than-one-scheduler), see the new `scheduler.replicas` value - significantly improved git-sync system by moving to [kubernetes/git-sync](https://github.com/kubernetes/git-sync) - significantly improved pip installs by moving to an init-container -- added a [guide for integrating airflow with your "Microsoft AD" or "OAUTH"](README.md#how-to-authenticate-airflow-users-with-ldapoauth) +- added docs for [How to integrate airflow with LDAP or OAUTH?](docs/faq/security/ldap-oauth.md) - general cleanup of almost every helm file - significant docs/README rewrite diff --git a/charts/airflow/CONTRIBUTING.md b/charts/airflow/CONTRIBUTING.md index 4f39b4ae..d91514ef 100644 --- a/charts/airflow/CONTRIBUTING.md +++ b/charts/airflow/CONTRIBUTING.md @@ -72,7 +72,7 @@ Most non-patch changes will require documentation updates. If you __ADD a value__: - ensure the value has a descriptive docstring in `values.yaml` -- ensure the value is listed under `Values Reference` in [README.md](README.md#values-reference) +- ensure the value is listed under `Helm Values` in [README.md](README.md#helm-values) - Note, only directly include the value if it's a top-level value like `airflow.level_1`, otherwise only include `airflow.level_1.*` If you __bump the version__: diff --git a/charts/airflow/README.md b/charts/airflow/README.md index 84f24c50..e891a9cd 100644 --- a/charts/airflow/README.md +++ b/charts/airflow/README.md @@ -1,1764 +1,133 @@ # Airflow Helm Chart (User Community) -__Previously known as `stable/airflow`__ - [![Artifact HUB](https://img.shields.io/endpoint?url=https://artifacthub.io/badge/repository/airflow-helm)](https://artifacthub.io/packages/helm/airflow-helm/airflow) -## About - -The `Airflow Helm Chart (User Community)` provides a standard way to deploy [Apache Airflow](https://airflow.apache.org/) on Kubernetes with Helm, and is used by thousands of companies for production deployments of Airflow. - -### Goals: - -(1) Ease of Use
-(2) Great Documentation
-(3) Support for older Airflow Versions
-(4) Support for Kubernetes GitOps Tools (like ArgoCD) - -### History: - -The `Airflow Helm Chart (User Community)` is a popular alternative to the official chart released in 2021 inside the `apache/airflow` git repository. -It was created in 2018 and was previously called `stable/airflow` when it lived in the (now end-of-life) [helm/charts](https://github.com/helm/charts/tree/master/stable/airflow) repository. - -### Airflow Version Support: - -Chart Version →
Airflow Version ↓ | `7.X.X` | `8.X.X` | ---- | --- | --- -`1.10.X` | ✅ | ✅️ [1] -`2.0.X`| ❌ | ✅ -`2.1.X`| ❌ | ✅ - -[1] you must set `airflow.legacyCommands = true` to use airflow version `1.10.X` with chart version `8.X.X` - -### Airflow Executor Support: +The `User-Community Airflow Helm Chart` is the standard way to deploy [Apache Airflow](https://airflow.apache.org/) on [Kubernetes](https://kubernetes.io/) with [Helm](https://helm.sh/). +Originally created in 2018, it has since helped thousands of companies create production-ready deployments of Airflow on Kubernetes. + +> __NOTE:__ this project is independent of the official chart found in the `apache/airflow` GitHub repository, and is supported by the kind contributions of users like yourself! + +## Project Goals + +1. Ease of Use +2. Great Documentation +3. Support for older Airflow Versions +4. Support for Kubernetes GitOps Tools (like ArgoCD) + +## Key Features + +- Support for Airflow `1.10+` and `2.0+` ([version support matrix](#airflow-version-support)) +- Support for `CeleryExecutor`, `KubernetesExecutor`, and `CeleryKubernetesExecutor` ([executor support matrix](#airflow-executor-support)) +- Easily integrate with your `PostgresSQL` or `MySQL` databases ([docs](https://github.com/airflow-helm/charts/tree/main/charts/airflow/docs/faq/database/external-database.md)) +- Automatic deployment of `PgBouncer` to reduce PostgresSQL database strain ([docs](https://github.com/airflow-helm/charts/tree/main/charts/airflow/docs/faq/database/pgbouncer.md)) +- Declaratively manage Airflow configurations: + - Configs ([docs](https://github.com/airflow-helm/charts/tree/main/charts/airflow/docs/faq/configuration/airflow-configs.md)) + - Users ([docs](https://github.com/airflow-helm/charts/tree/main/charts/airflow/docs/faq/security/airflow-users.md)) + - Connections ([docs](https://github.com/airflow-helm/charts/tree/main/charts/airflow/docs/faq/dags/airflow-connections.md)) + - Variables ([docs](https://github.com/airflow-helm/charts/tree/main/charts/airflow/docs/faq/dags/airflow-variables.md)) + - Pools ([docs](https://github.com/airflow-helm/charts/tree/main/charts/airflow/docs/faq/dags/airflow-pools.md)) +- Multiple ways to load your DAG definitions: + - Git-Sync Sidecar ([docs](https://github.com/airflow-helm/charts/tree/main/charts/airflow/docs/faq/dags/load-dag-definitions.md#option-1---git-sync-sidecar)) + - Persistent Volume ([docs](https://github.com/airflow-helm/charts/tree/main/charts/airflow/docs/faq/dags/load-dag-definitions.md#option-2---persistent-volume)) + - Embedded Into Container Image ([docs](https://github.com/airflow-helm/charts/tree/main/charts/airflow/docs/faq/dags/load-dag-definitions.md#option-3---embedded-into-container-image)) +- Multiple ways to install extra Python packages: + - Init-Containers ([docs](https://github.com/airflow-helm/charts/tree/main/charts/airflow/docs/faq/configuration/extra-python-packages.md#option-1---use-init-containers)) + - Embedded Into Container Image ([docs](https://github.com/airflow-helm/charts/tree/main/charts/airflow/docs/faq/configuration/extra-python-packages.md#option-2---embedded-into-container-image-recommended)) +- Automatic restarting of unhealthy Airflow Schedulers ([docs](https://github.com/airflow-helm/charts/tree/main/charts/airflow/docs/faq/monitoring/scheduler-liveness-probe.md)) +- Automatic clean up of old airflow logs ([docs](https://github.com/airflow-helm/charts/tree/main/charts/airflow/docs/faq/monitoring/log-cleanup.md)) +- Out-of-the-box Support for `ArgoCD` and similar tools +- Personalised tips/warnings after each `helm upgrade` + +## Guides + +- [`"Quickstart Guide"`](https://github.com/airflow-helm/charts/tree/main/charts/airflow/docs/guides/quickstart.md) +- [`"Upgrade Guide"`](https://github.com/airflow-helm/charts/tree/main/charts/airflow/docs/guides/upgrade.md) +- [`"Uninstall Guide"`](https://github.com/airflow-helm/charts/tree/main/charts/airflow/docs/guides/uninstall.md) + +## Frequently Asked Questions + +- __Configuration:__ + - [`How to set the airflow version?`](https://github.com/airflow-helm/charts/tree/main/charts/airflow/docs/faq/configuration/airflow-version.md) + - [`How to set airflow configs?`](https://github.com/airflow-helm/charts/tree/main/charts/airflow/docs/faq/configuration/airflow-configs.md) + - [`How to load airflow plugins?`](https://github.com/airflow-helm/charts/tree/main/charts/airflow/docs/faq/configuration/airflow-plugins.md) + - [`How to install extra python packages?`](https://github.com/airflow-helm/charts/tree/main/charts/airflow/docs/faq/configuration/extra-python-packages.md) + - [`How to configure autoscaling for celery workers?`](https://github.com/airflow-helm/charts/tree/main/charts/airflow/docs/faq/configuration/autoscaling-celery-workers.md) +- __DAGs:__ + - [`How to load DAG definitions?`](https://github.com/airflow-helm/charts/tree/main/charts/airflow/docs/faq/dags/load-dag-definitions.md) + - [`How to manage airflow connections?`](https://github.com/airflow-helm/charts/tree/main/charts/airflow/docs/faq/dags/airflow-connections.md) + - [`How to manage airflow variables?`](https://github.com/airflow-helm/charts/tree/main/charts/airflow/docs/faq/dags/airflow-variables.md) + - [`How to manage airflow pools?`](https://github.com/airflow-helm/charts/tree/main/charts/airflow/docs/faq/dags/airflow-pools.md) +- __Security:__ + - [`How to manage airflow users?`](https://github.com/airflow-helm/charts/tree/main/charts/airflow/docs/faq/security/airflow-users.md) + - [`How to integrate airflow with LDAP or OAUTH?`](https://github.com/airflow-helm/charts/tree/main/charts/airflow/docs/faq/security/ldap-oauth.md) + - [`How to set the fernet encryption key?`](https://github.com/airflow-helm/charts/tree/main/charts/airflow/docs/faq/security/set-fernet-key.md) + - [`How to set the webserver secret key?`](https://github.com/airflow-helm/charts/tree/main/charts/airflow/docs/faq/security/set-webserver-secret-key.md) +- __Monitoring:__ + - [`How to persist airflow logs?`](https://github.com/airflow-helm/charts/tree/main/charts/airflow/docs/faq/monitoring/log-persistence.md) + - [`How to automatically clean up airflow logs?`](https://github.com/airflow-helm/charts/tree/main/charts/airflow/docs/faq/monitoring/log-cleanup.md) + - [`How to configure the scheduler liveness probe?`](https://github.com/airflow-helm/charts/tree/main/charts/airflow/docs/faq/monitoring/scheduler-liveness-probe.md) + - [`How to integrate airflow with Prometheus?`](https://github.com/airflow-helm/charts/tree/main/charts/airflow/docs/faq/monitoring/prometheus.md) +- __Databases:__ + - [`How to configure the embedded database?`](https://github.com/airflow-helm/charts/tree/main/charts/airflow/docs/faq/database/embedded-database.md) + - [`How to configure the embedded redis?`](https://github.com/airflow-helm/charts/tree/main/charts/airflow/docs/faq/database/embedded-redis.md) + - [`How to configure an external database?`](https://github.com/airflow-helm/charts/tree/main/charts/airflow/docs/faq/database/external-database.md) + - [`How to configure an external redis?`](https://github.com/airflow-helm/charts/tree/main/charts/airflow/docs/faq/database/external-redis.md) + - [`How to configure pgbouncer?`](https://github.com/airflow-helm/charts/tree/main/charts/airflow/docs/faq/database/pgbouncer.md) +- __Kubernetes:__ + - [`How to set up a kubernetes ingress?`](https://github.com/airflow-helm/charts/tree/main/charts/airflow/docs/faq/kubernetes/ingress.md) + - [`How to add extra kubernetes manifests?`](https://github.com/airflow-helm/charts/tree/main/charts/airflow/docs/faq/kubernetes/extra-manifests.md) + - [`How to configure pod affinity, nodeSelector, and tolerations?`](https://github.com/airflow-helm/charts/tree/main/charts/airflow/docs/faq/kubernetes/affinity-node-selectors-tolerations.md) + - [`How to mount extra persistent volumes?`](https://github.com/airflow-helm/charts/tree/main/charts/airflow/docs/faq/kubernetes/mount-persistent-volumes.md) + - [`How to mount ConfigMaps and Secrets as environment variables?`](https://github.com/airflow-helm/charts/tree/main/charts/airflow/docs/faq/kubernetes/mount-environment-variables.md) + - [`How to mount ConfigMaps and Secrets as files?`](https://github.com/airflow-helm/charts/tree/main/charts/airflow/docs/faq/kubernetes/mount-files.md) + +## Examples + +The following table contains example deployments of the `User-Community Airflow Helm Chart`. + +Example | Deployment Environment | Airflow Version | Airflow Executor +--- | --- | --- | --- +[minikube](https://github.com/airflow-helm/charts/tree/main/charts/airflow/examples/minikube) | `Minikube` or `Kind` or `K3D` | `1.10+` or `2.0+` | `CeleryExecutor` +[google-gke](https://github.com/airflow-helm/charts/tree/main/charts/airflow/examples/google-gke) | `Google Cloud - GKE` | `1.10+` or `2.0+` | `CeleryExecutor` + +## Airflow Version Support + +> __TIP:__ you may use any supported airflow version by [defining the appropriate `airflow.image.tag` value](https://github.com/airflow-helm/charts/tree/main/charts/airflow/docs/faq/configuration/airflow-version.md) + +The following versions of airflow are supported by the User-Community Airflow Helm Chart. + +Chart Version →
Airflow Version ↓ | `7.0.0` - `7.16.0` | `8.0.0` - `8.5.3` | `8.6.0+` | +--- | --- | --- | --- +`1.10.X` | ✅ | ⚠️ [1] | ⚠️ [1] +`2.0.X` | ❌ | ✅ | ✅️ +`2.1.X` | ❌ | ✅ | ✅️ +`2.2.X` | ❌ | ⚠️ [2] | ✅️ + +[1] you must set `airflow.legacyCommands = true` when using airflow version `1.10.X`
+[2] the [Deferrable Operators & Triggers](https://airflow.apache.org/docs/apache-airflow/stable/concepts/deferring.html) feature won't work, as there is no `airflow triggerer` Deployment + +## Airflow Executor Support + +> __TIP:__ you may use any supported airflow executor type by defining the appropriate `airflow.executor` value + +The following airflow executor types are supported by the User-Community Airflow Helm Chart. Chart Version →
Airflow Executor ↓ | `7.X.X` | `8.X.X` | --- | --- | --- `CeleryExecutor` | ✅ | ✅ -`KubernetesExecutor` | ✅️ [1] | ✅ +`KubernetesExecutor` | ⚠️️ [1] | ✅ `CeleryKubernetesExecutor` | ❌ | ✅ -[1] we encourage you to use chart version `8.X.X`, so you can use the `airflow.kubernetesPodTemplate.*` values (note, requires airflow `1.10.11+`, as it uses [AIRFLOW__KUBERNETES__POD_TEMPLATE_FILE](https://airflow.apache.org/docs/apache-airflow/2.1.0/configurations-ref.html#pod-template-file)) - - -## Quickstart Guide - -### Install: - -__(Step 1) - Add this helm repository:__ -```sh -## add this helm repository & pull updates from it -helm repo add airflow-stable https://airflow-helm.github.io/charts -helm repo update -``` - -__(Step 2) - Install this chart:__ -```sh -## set the release-name & namespace -export AIRFLOW_NAME="airflow-cluster" -export AIRFLOW_NAMESPACE="airflow-cluster" - -## install using helm 3 -helm install \ - $AIRFLOW_NAME \ - airflow-stable/airflow \ - --namespace $AIRFLOW_NAMESPACE \ - --version "8.X.X" \ - --values ./custom-values.yaml - -## wait until the above command returns (may take a while) -``` - -__(Step 3) - Locally expose the airflow webserver:__ -```sh -## port-forward the airflow webserver -kubectl port-forward svc/${AIRFLOW_NAME}-web 8080:8080 --namespace $AIRFLOW_NAMESPACE - -## open your browser to: http://localhost:8080 -## default login: admin/admin -``` - -### Upgrade: - -> __WARNING__: always consult the [CHANGELOG](CHANGELOG.md) before upgrading chart versions - -```yaml -## pull updates from the helm repository -helm repo update - -## apply any new values // upgrade chart version to 8.X.X -helm upgrade \ - $AIRFLOW_NAME \ - airflow-stable/airflow \ - --namespace $AIRFLOW_NAMESPACE \ - --version "8.X.X" \ - --values ./custom-values.yaml -``` - -### Uninstall: - -```yaml -## uninstall the chart -helm uninstall $AIRFLOW_NAME --namespace $AIRFLOW_NAMESPACE -``` - -### Examples: - -To help you create your `custom-values.yaml` file, we provide some examples for common situations: - -- ["Minikube - CeleryExecutor"](examples/minikube/custom-values.yaml) -- ["Google (GKE) - CeleryExecutor"](examples/google-gke/custom-values.yaml) - -### Frequently Asked Questions: - -> __NOTE:__ some values are not discussed in the `FAQ`, you can view the default [values.yaml](values.yaml) file for a full list of values - -Review the FAQ to understand how the chart functions, here are some good starting points: - -- ["How to use a specific version of airflow?"](#how-to-use-a-specific-version-of-airflow) -- ["How to set airflow configs?"](#how-to-set-airflow-configs) -- ["How to create airflow users?"](#how-to-create-airflow-users) -- ["How to authenticate airflow users with LDAP/OAUTH?"](#how-to-authenticate-airflow-users-with-ldapoauth) -- ["How to create airflow connections?"](#how-to-create-airflow-connections) -- ["How to use an external database?"](#how-to-use-an-external-database) -- ["How to persist airflow logs?"](#how-to-persist-airflow-logs) -- ["How to set up an Ingress?"](#how-to-set-up-an-ingress) - -## FAQ - Airflow - -> __Frequently asked questions related to airflow configs__ - -### How to use a specific version of airflow? -
-Expand -
- -There will always be a single default version of airflow shipped with this chart, see `airflow.image.*` in [values.yaml](values.yaml) for the current one, but other versions are supported, please see the [Airflow Version Support](#airflow-version-support) matrix. - -For example, using airflow `2.0.1`, with python `3.6`: -```yaml -airflow: - image: - repository: apache/airflow - tag: 2.0.1-python3.6 -``` - -For example, using airflow `1.10.15`, with python `3.8`: -```yaml -airflow: - # this must be "true" for airflow 1.10 - legacyCommands: true - - image: - repository: apache/airflow - tag: 1.10.15-python3.8 -``` - -
-
- -### How to set airflow configs? -
-Expand -
- -While we don't expose the "airflow.cfg" file directly, you can use [environment variables](https://airflow.apache.org/docs/stable/howto/set-config.html) to set Airflow configs. - -The `airflow.config` value makes this easier, each key-value is mounted as an environment variable on each scheduler/web/worker/flower Pod: -```yaml -airflow: - config: - ## security - AIRFLOW__WEBSERVER__EXPOSE_CONFIG: "False" - - ## dags - AIRFLOW__CORE__LOAD_EXAMPLES: "False" - AIRFLOW__SCHEDULER__DAG_DIR_LIST_INTERVAL: "30" - - ## email - AIRFLOW__EMAIL__EMAIL_BACKEND: "airflow.utils.email.send_email_smtp" - AIRFLOW__SMTP__SMTP_HOST: "smtpmail.example.com" - AIRFLOW__SMTP__SMTP_MAIL_FROM: "admin@example.com" - AIRFLOW__SMTP__SMTP_PORT: "25" - AIRFLOW__SMTP__SMTP_SSL: "False" - AIRFLOW__SMTP__SMTP_STARTTLS: "False" - - ## domain used in airflow emails - AIRFLOW__WEBSERVER__BASE_URL: "http://airflow.example.com" - - ## ether environment variables - HTTP_PROXY: "http://proxy.example.com:8080" -``` - -If you want to set [cluster policies](https://airflow.apache.org/docs/apache-airflow/stable/concepts/cluster-policies.html) with an `airflow_local_settings.py` file, you can use the `airflow.localSettings.*` values: -```yaml -airflow: - localSettings: - ## the full content of the `airflow_local_settings.py` file (as a string) - stringOverride: | - # use a custom `xcom_sidecar` image for KubernetesPodOperator() - from airflow.kubernetes.pod_generator import PodDefaults - PodDefaults.SIDECAR_CONTAINER.image = "gcr.io/PROJECT-ID/custom-sidecar-image" - - ## the name of a Secret containing a `airflow_local_settings.py` key - ## (if set, this disables `airflow.localSettings.stringOverride`) - #existingSecret: "my-airflow-local-settings" -``` - -
-
- -### How to store DAGs? -
-Expand -
- -

Option 1a - git-sync sidecar (SSH auth)

- -This method uses an SSH git-sync sidecar to sync your git repo into the dag folder every `dags.gitSync.syncWait` seconds. - -Example values defining an SSH git repo: -```yaml -airflow: - config: - AIRFLOW__SCHEDULER__DAG_DIR_LIST_INTERVAL: 60 - -dags: - gitSync: - enabled: true - repo: "git@github.com:USERNAME/REPOSITORY.git" - branch: "master" - revision: "HEAD" - syncWait: 60 - sshSecret: "airflow-ssh-git-secret" - sshSecretKey: "id_rsa" - - # "known_hosts" verification can be disabled by setting to "" - sshKnownHosts: |- - github.com ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAq2A7hRGmdnm9tUDbO9IDSwBK6TbQa+PXYPCPy6rbTrTtw7PHkccKrpp0yVhp5HdEIcKr6pLlVDBfOLX9QUsyCOV0wzfjIJNlGEYsdlLJizHhbn2mUjvSAHQqZETYP81eFzLQNnPHt4EVVUh7VfDESU84KezmD5QlWpXLmvU31/yMf+Se8xhHTvKSCZIFImWwoG6mbUoWf9nzpIoaSjB+weqqUUmpaaasXVal72J+UX2B+2RPW3RcT0eOzQgqlJL3RKrTJvdsjE3JEAvGq3lGHSZXy28G3skua2SmVi/w4yCE6gbODqnTWlg7+wC604ydGXA8VJiS5ap43JXiUFFAaQ== -``` - -You can create the `airflow-ssh-git-secret` Secret using: -```console -kubectl create secret generic \ - airflow-ssh-git-secret \ - --from-file=id_rsa=$HOME/.ssh/id_rsa \ - --namespace my-airflow-namespace -``` - -

Option 1b - git-sync sidecar (HTTP auth)

- -This method uses an HTTP git sidecar to sync your git repo into the dag folder every `dags.gitSync.syncWait` seconds. - -Example values defining an HTTP git repo: -```yaml -airflow: - config: - AIRFLOW__SCHEDULER__DAG_DIR_LIST_INTERVAL: 60 - -dags: - gitSync: - enabled: true - repo: "https://github.com/USERNAME/REPOSITORY.git" - branch: "master" - revision: "HEAD" - syncWait: 60 - httpSecret: "airflow-http-git-secret" - httpSecretUsernameKey: username - httpSecretPasswordKey: password -``` - -You can create `airflow-http-git-secret` Secret using: -```console -kubectl create secret generic \ - airflow-http-git-secret \ - --from-literal=username=MY_GIT_USERNAME \ - --from-literal=password=MY_GIT_TOKEN \ - --namespace my-airflow-namespace -``` - -

Option 2a - PersistentVolumeClaim (chart-managed)

- -With this method, you store your DAGs in a Kubernetes PersistentVolume, which is mounted to all scheduler/web/worker Pods. -You must configure some external system to ensure this volume has your latest DAGs. -For example, you could use your CI/CD pipeline system to preform a sync as changes are pushed to your DAGs git repo. - -Example values to create a PVC with the `storageClass` called `default` and 1Gi initial `size`: -```yaml -dags: - persistence: - enabled: true - storageClass: default - accessMode: ReadOnlyMany - size: 1Gi -``` - -

Option 2b - PersistentVolumeClaim (existing / user-managed)

- -> 🟨 __Note__ 🟨 -> -> Your `dags.persistence.existingClaim` PVC must support `ReadOnlyMany` or `ReadWriteMany` for `accessMode` - -Example values to use an existing PVC called `my-dags-pvc`: -```yaml -dags: - persistence: - enabled: true - existingClaim: my-dags-pvc - accessMode: ReadOnlyMany -``` - -

Option 3 - embedded into container image

- -> 🟨 __Note__ 🟨 -> -> This chart uses the official [apache/airflow](https://hub.docker.com/r/apache/airflow) images, consult airflow's official [docs about custom images](https://airflow.apache.org/docs/apache-airflow/2.0.1/production-deployment.html#production-container-images) - -This method stores your DAGs inside the container image. - -Example extending `airflow:2.0.1-python3.8` with some dags: -```dockerfile -FROM apache/airflow:2.0.1-python3.8 - -# NOTE: dag path is set with the `dags.path` value -COPY ./my_dag_folder /opt/airflow/dags -``` - -Example values to use `MY_REPO:MY_TAG` container image with the chart: -```yaml -airflow: - image: - repository: MY_REPO - tag: MY_TAG -``` - -
-
- -### How to install extra pip packages? -
-Expand -
- -

Option 1 - use init-containers

- -> 🟥 __Warning__ 🟥 -> -> We strongly advice that you DO NOT TO USE this feature in production, instead please use "Option 2" - -You can use the `airflow.extraPipPackages` value to install pip packages on all Pods, you can also use the more specific `scheduler.extraPipPackages`, `web.extraPipPackages`, `worker.extraPipPackages` and `flower.extraPipPackages`. -Packages defined with the more specific values will take precedence over `airflow.extraPipPackages`, as they are listed at the end of the `pip install ...` command, and pip takes the package version which is __defined last__. - -Example values for installing the `airflow-exporter` package on all scheduler/web/worker/flower Pods: -```yaml -airflow: - extraPipPackages: - - "airflow-exporter~=1.4.1" -``` - -Example values for installing PyTorch on the scheduler/worker Pods only: -```yaml -scheduler: - extraPipPackages: - - "torch~=1.8.0" - -worker: - extraPipPackages: - - "torch~=1.8.0" -``` - -Example values to install pip packages from a private pip `--index-url`: -```yaml -airflow: - config: - ## pip configs can be set with environment variables - PIP_TIMEOUT: 60 - PIP_INDEX_URL: https://:@example.com/packages/simple/ - PIP_TRUSTED_HOST: example.com - - extraPipPackages: - - "my-internal-package==1.0.0" -``` - -

Option 2 - embedded into container image (recommended)

- -This chart uses the official [apache/airflow](https://hub.docker.com/r/apache/airflow) images, consult airflow's official [docs about custom images](https://airflow.apache.org/docs/apache-airflow/2.0.1/production-deployment.html#production-container-images), you can extend the airflow container image with your pip packages. - -For example, extending `airflow:2.0.1-python3.8` with the `torch` package: -```dockerfile -FROM apache/airflow:2.0.1-python3.8 - -# install your pip packages -RUN pip install torch~=1.8.0 -``` - -Example values to use your `MY_REPO:MY_TAG` container image: -```yaml -airflow: - image: - repository: MY_REPO - tag: MY_TAG -``` - -
-
- -### How to create airflow users? -
-Expand -
- -

Option 1 - use plain-text

- -You can use the `airflow.users` value to create airflow users in a declarative way. - -Example values to create `admin` (with "Admin" RBAC role) and `user` (with "User" RBAC role): -```yaml -airflow: - users: - - username: admin - password: admin - role: Admin - email: admin@example.com - firstName: admin - lastName: admin - - username: user - password: user123 - ## TIP: `role` can be a single role or a list of roles - role: - - User - - Viewer - email: user@example.com - firstName: user - lastName: user - - ## if we create a Deployment to perpetually sync `airflow.users` - usersUpdate: true -``` - -

Option 2 - use templates from Secrets/ConfigMaps

- -> 🟨 __Note__ 🟨 -> -> If `airflow.usersUpdate = true`, the users which use `airflow.usersTemplates` will be updated in real-time, allowing tools like [external-secrets](https://github.com/external-secrets/kubernetes-external-secrets) to be used. - -You can use `airflow.usersTemplates` to extract string templates from keys in Secrets or Configmaps. - -Example values to use templates from `Secret/my-secret` and `ConfigMap/my-configmap` in parts of the `admin` user: -```yaml -airflow: - users: - - username: admin - password: ${ADMIN_PASSWORD} - role: Admin - email: ${ADMIN_EMAIL} - firstName: admin - lastName: admin - - ## bash-like templates to be used in `airflow.users` - usersTemplates: - ADMIN_PASSWORD: - kind: secret - name: my-secret - key: password - ADMIN_EMAIL: - kind: configmap - name: my-configmap - key: email - - ## if we create a Deployment to perpetually sync `airflow.users` - usersUpdate: true -``` - -
-
- -### How to authenticate airflow users with LDAP/OAUTH? -
-Expand -
- -> 🟥 __Warning__ 🟥 -> -> If you set up LDAP/OAUTH, you should set `airflow.users = []` (and delete any previously created users) -> -> The version of Flask-Builder installed might not be the latest, see [How to install extra pip packages?](#how-to-install-extra-pip-packages) - -You can use the `web.webserverConfig.*` values to adjust the Flask-Appbuilder `webserver_config.py` file, read [Flask-builder's security docs](https://flask-appbuilder.readthedocs.io/en/latest/security.html) for further reference. - -

Option 1 - use LDAP

- -Example values to integrate with a typical Microsoft Active Directory using `AUTH_LDAP`: -```yaml -web: - # WARNING: for production usage, create your own image with these packages installed rather than using `extraPipPackages` - extraPipPackages: - ## the following configs require Flask-AppBuilder 3.2.0 (or later) - - "Flask-AppBuilder~=3.3.0" - ## the following configs require python-ldap - - "python-ldap~=3.3.1" - - webserverConfig: - stringOverride: |- - from airflow import configuration as conf - from flask_appbuilder.security.manager import AUTH_LDAP - - SQLALCHEMY_DATABASE_URI = conf.get('core', 'SQL_ALCHEMY_CONN') - - AUTH_TYPE = AUTH_LDAP - AUTH_LDAP_SERVER = "ldap://ldap.example.com" - AUTH_LDAP_USE_TLS = False - - # registration configs - AUTH_USER_REGISTRATION = True # allow users who are not already in the FAB DB - AUTH_USER_REGISTRATION_ROLE = "Public" # this role will be given in addition to any AUTH_ROLES_MAPPING - AUTH_LDAP_FIRSTNAME_FIELD = "givenName" - AUTH_LDAP_LASTNAME_FIELD = "sn" - AUTH_LDAP_EMAIL_FIELD = "mail" # if null in LDAP, email is set to: "{username}@email.notfound" - - # bind username (for password validation) - AUTH_LDAP_USERNAME_FORMAT = "uid=%s,ou=users,dc=example,dc=com" # %s is replaced with the provided username - # AUTH_LDAP_APPEND_DOMAIN = "example.com" # bind usernames will look like: {USERNAME}@example.com - - # search configs - AUTH_LDAP_SEARCH = "ou=users,dc=example,dc=com" # the LDAP search base (if non-empty, a search will ALWAYS happen) - AUTH_LDAP_UID_FIELD = "uid" # the username field - - # a mapping from LDAP DN to a list of FAB roles - AUTH_ROLES_MAPPING = { - "cn=airflow_users,ou=groups,dc=example,dc=com": ["User"], - "cn=airflow_admins,ou=groups,dc=example,dc=com": ["Admin"], - } - - # the LDAP user attribute which has their role DNs - AUTH_LDAP_GROUP_FIELD = "memberOf" - - # if we should replace ALL the user's roles each login, or only on registration - AUTH_ROLES_SYNC_AT_LOGIN = True - - # force users to re-auth after 30min of inactivity (to keep roles in sync) - PERMANENT_SESSION_LIFETIME = 1800 -``` - -

Option 2 - use OAUTH

- -Example values to integrate with Okta using `AUTH_OAUTH`: -```yaml -web: - extraPipPackages: - ## the following configs require Flask-AppBuilder 3.2.0 (or later) - - "Flask-AppBuilder~=3.3.0" - ## the following configs require Authlib - - "Authlib~=0.15.3" - - webserverConfig: - stringOverride: |- - from airflow import configuration as conf - from flask_appbuilder.security.manager import AUTH_OAUTH - - SQLALCHEMY_DATABASE_URI = conf.get('core', 'SQL_ALCHEMY_CONN') - - AUTH_TYPE = AUTH_OAUTH - - # registration configs - AUTH_USER_REGISTRATION = True # allow users who are not already in the FAB DB - AUTH_USER_REGISTRATION_ROLE = "Public" # this role will be given in addition to any AUTH_ROLES_MAPPING - - # the list of providers which the user can choose from - OAUTH_PROVIDERS = [ - { - 'name': 'okta', - 'icon': 'fa-circle-o', - 'token_key': 'access_token', - 'remote_app': { - 'client_id': 'OKTA_KEY', - 'client_secret': 'OKTA_SECRET', - 'api_base_url': 'https://OKTA_DOMAIN.okta.com/oauth2/v1/', - 'client_kwargs': { - 'scope': 'openid profile email groups' - }, - 'access_token_url': 'https://OKTA_DOMAIN.okta.com/oauth2/v1/token', - 'authorize_url': 'https://OKTA_DOMAIN.okta.com/oauth2/v1/authorize', - } - } - ] - - # a mapping from the values of `userinfo["role_keys"]` to a list of FAB roles - AUTH_ROLES_MAPPING = { - "FAB_USERS": ["User"], - "FAB_ADMINS": ["Admin"], - } - - # if we should replace ALL the user's roles each login, or only on registration - AUTH_ROLES_SYNC_AT_LOGIN = True - - # force users to re-auth after 30min of inactivity (to keep roles in sync) - PERMANENT_SESSION_LIFETIME = 1800 -``` - -
-
- -### How to set a custom fernet encryption key? -
-Expand -
+[1] we encourage you to use chart version `8.X.X`, so you can use the `airflow.kubernetesPodTemplate.*` values (requires airflow `1.10.11+`) -

Option 1 - using the value

- -> 🟥 __Warning__ 🟥 -> -> We strongly recommend that you DO NOT USE the default `airflow.fernetKey` in production. - -You can set the fernet encryption key using the `airflow.fernetKey` value, which sets the `AIRFLOW__CORE__FERNET_KEY` environment variable. - -Example values to define the fernet key with `airflow.fernetKey`: -```yaml -aiflow: - fernetKey: "7T512UXSSmBOkpWimFHIVb8jK6lfmSAvx4mO6Arehnc=" -``` - -

Option 2 - using a secret (recommended)

- -You can set the fernet encryption key from a Kubernetes Secret by referencing it with the `airflow.extraEnv` value. - -Example values to use the `value` key from the existing Secret `airflow-fernet-key`: -```yaml -airflow: - extraEnv: - - name: AIRFLOW__CORE__FERNET_KEY - valueFrom: - secretKeyRef: - name: airflow-fernet-key - key: value -``` - -

Option 3 - using `_CMD` or `_SECRET` configs

- -You can also set the fernet key by specifying either the `AIRFLOW__CORE__FERNET_KEY_CMD` or `AIRFLOW__CORE__FERNET_KEY_SECRET` environment variables. -Read about how the `_CMD` or `_SECRET` configs work in the ["Setting Configuration Options"](https://airflow.apache.org/docs/apache-airflow/stable/howto/set-config.html) section of the Airflow documentation. - -Example values for using `AIRFLOW__CORE__FERNET_KEY_CMD`: - -```yaml -airflow: - ## WARNING: you must set `fernetKey` to "", otherwise it will take precedence - fernetKey: "" - - ## NOTE: this is only an example, if your value lives in a Secret, you probably want to use "Option 2" above - config: - AIRFLOW__CORE__FERNET_KEY_CMD: "cat /opt/airflow/fernet-key/value" - - extraVolumeMounts: - - name: fernet-key - mountPath: /opt/airflow/fernet-key - readOnly: true - - extraVolumes: - - name: fernet-key - secret: - secretName: airflow-fernet-key -``` +## Helm Values -
-
- -### How to set a custom webserver secret_key? -
-Expand -
- -

Option 1 - using the value

- -> 🟥 __Warning__ 🟥 -> -> We strongly recommend that you DO NOT USE the default `airflow.webserverSecretKey` in production. - -You can set the webserver secret_key using the `airflow.webserverSecretKey` value, which sets the `AIRFLOW__WEBSERVER__SECRET_KEY` environment variable. - -Example values to define the secret_key with `airflow.webserverSecretKey`: -```yaml -aiflow: - webserverSecretKey: "THIS IS UNSAFE!" -``` - -

Option 2 - using a secret (recommended)

- -You can set the webserver secret_key from a Kubernetes Secret by referencing it with the `airflow.extraEnv` value. - -Example values to use the `value` key from the existing Secret `airflow-webserver-secret-key`: -```yaml -airflow: - extraEnv: - - name: AIRFLOW__WEBSERVER__SECRET_KEY - valueFrom: - secretKeyRef: - name: airflow-webserver-secret-key - key: value -``` - -

Option 3 - using `_CMD` or `_SECRET` configs

- -You can also set the webserver secret key by specifying either the `AIRFLOW__WEBSERVER__SECRET_KEY_CMD` or `AIRFLOW__WEBSERVER__SECRET_KEY_SECRET` environment variables. -Read about how the `_CMD` or `_SECRET` configs work in the ["Setting Configuration Options"](https://airflow.apache.org/docs/apache-airflow/stable/howto/set-config.html) section of the Airflow documentation. - -Example values for using `AIRFLOW__WEBSERVER__SECRET_KEY_CMD`: - -```yaml -airflow: - ## WARNING: you must set `webserverSecretKey` to "", otherwise it will take precedence - webserverSecretKey: "" - - ## NOTE: this is only an example, if your value lives in a Secret, you probably want to use "Option 2" above - config: - AIRFLOW__WEBSERVER__SECRET_KEY_CMD: "cat /opt/airflow/webserver-secret-key/value" - - extraVolumeMounts: - - name: webserver-secret-key - mountPath: /opt/airflow/webserver-secret-key - readOnly: true - - extraVolumes: - - name: webserver-secret-key - secret: - secretName: airflow-webserver-secret-key -``` - -
-
- -### How to create airflow connections? -
-Expand -
+> __TIP:__ the full list of values can be found in the [default `values.yaml` file](https://github.com/airflow-helm/charts/tree/main/charts/airflow/values.yaml) -

Option 1 - use plain-text

- -You can use the `airflow.connections` value to create airflow [Connections](https://airflow.apache.org/docs/apache-airflow/stable/concepts.html#connections) in a declarative way. - -Example values to create connections called `my_aws`, `my_gcp`, `my_postgres`, and `my_ssh`: -```yaml -airflow: - connections: - ## see docs: https://airflow.apache.org/docs/apache-airflow-providers-amazon/stable/connections/aws.html - - id: my_aws - type: aws - description: my AWS connection - extra: |- - { "aws_access_key_id": "XXXXXXXX", - "aws_secret_access_key": "XXXXXXXX", - "region_name":"eu-central-1" } - ## see docs: https://airflow.apache.org/docs/apache-airflow-providers-google/stable/connections/gcp.html - - id: my_gcp - type: google_cloud_platform - description: my GCP connection - extra: |- - { "extra__google_cloud_platform__keyfile_dict": "XXXXXXXX", - "extra__google_cloud_platform__num_retries: "XXXXXXXX" } - ## see docs: https://airflow.apache.org/docs/apache-airflow-providers-postgres/stable/connections/postgres.html - - id: my_postgres - type: postgres - description: my Postgres connection - host: postgres.example.com - port: 5432 - login: db_user - password: db_pass - schema: my_db - extra: |- - { "sslmode": "allow" } - ## see docs: https://airflow.apache.org/docs/apache-airflow-providers-ssh/stable/connections/ssh.html - - id: my_ssh - type: ssh - description: my SSH connection - host: ssh.example.com - port: 22 - login: ssh_user - password: ssh_pass - extra: |- - { "timeout": "15" } - - ## if we create a Deployment to perpetually sync `airflow.connections` - connectionsUpdate: true -``` - -

Option 2 - use templates from Secrets/ConfigMaps

- -> 🟨 __Note__ 🟨 -> -> If `airflow.connectionsUpdate = true`, the connections which use `airflow.connectionsTemplates` will be updated in real-time, allowing tools like [external-secrets](https://github.com/external-secrets/kubernetes-external-secrets) to be used. - -You can use `airflow.connectionsTemplates` to extract string templates from keys in Secrets or Configmaps. - -Example values to use templates from `Secret/my-secret` and `ConfigMap/my-configmap` in parts of the `my_aws` connection: -```yaml -airflow: - connections: - - id: my_aws - type: aws - description: my AWS connection - extra: |- - { "aws_access_key_id": "${AWS_ACCESS_KEY_ID}", - "aws_secret_access_key": "${AWS_ACCESS_KEY}", - "region_name":"eu-central-1" } - - ## bash-like templates to be used in `airflow.connections` - connectionsTemplates: - AWS_ACCESS_KEY_ID: - kind: configmap - name: my-configmap - key: username - AWS_ACCESS_KEY: - kind: secret - name: my-secret - key: password - - ## if we create a Deployment to perpetually sync `airflow.connections` - connectionsUpdate: true -``` +The following is a summary of the __helm values__ provided by the User-Community Airflow Helm Chart. -
-
- -### How to create airflow variables?
-Expand -
- -

Option 1 - use plain-text

- -You can use the `airflow.variables` value to create airflow [Variables](https://airflow.apache.org/docs/apache-airflow/stable/concepts.html#variables) in a declarative way. - -Example values to create variables called `var_1`, `var_2`: -```yaml -airflow: - variables: - - key: "var_1" - value: "my_value_1" - - key: "var_2" - value: "my_value_2" - - ## if we create a Deployment to perpetually sync `airflow.variables` - variablesUpdate: true -``` - -

Option 2 - use templates from Secrets/Configmaps

- -> 🟨 __Note__ 🟨 -> -> If `airflow.variablesTemplates = true`, the connections which use `airflow.variablesTemplates` will be updated in real-time, allowing tools like [external-secrets](https://github.com/external-secrets/kubernetes-external-secrets) to be used. - -You can use `airflow.variablesTemplates` to extract string templates from keys in Secrets or Configmaps. - -Example values to use templates from `Secret/my-secret` and `ConfigMap/my-configmap` in the `var_1` and `var_2` variables: -```yaml -airflow: - variables: - - key: "var_1" - value: "${MY_VALUE_1}" - - key: "var_2" - value: "${MY_VALUE_2}" - - ## bash-like templates to be used in `airflow.variables` - variablesTemplates: - MY_VALUE_1: - kind: configmap - name: my-configmap - key: value1 - MY_VALUE_2: - kind: secret - name: my-secret - key: value2 - - ## if we create a Deployment to perpetually sync `airflow.variables` - ## - variablesUpdate: false -``` - -
-
- -### How to create airflow pools? -
-Expand -
- -You can use the `airflow.pools` value to create airflow [Pools](https://airflow.apache.org/docs/apache-airflow/stable/concepts.html#pools) in a declarative way. - -Example values to create pools called `pool_1`, `pool_2`: -```yaml -airflow: - pools: - - name: "pool_1" - description: "example pool with 5 slots" - slots: 5 - - name: "pool_2" - description: "example pool with 10 slots" - slots: 10 - - ## if we create a Deployment to perpetually sync `airflow.pools` - poolsUpdate: true -``` - -
-
- -### How to set up celery worker autoscaling? -
-Expand -
- -> 🟨 __Note__ 🟨 -> -> This method of autoscaling is not ideal. There is not necessarily a link between RAM usage, and the number of pending tasks, meaning you could have a situation where your workers don't scale up despite having pending tasks. - -The Airflow Celery Workers can be scaled using the [Horizontal Pod Autoscaler](https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/), to enable autoscaling, you must set `workers.autoscaling.enabled=true`, then provide `workers.autoscaling.maxReplicas`. - -Assume every task a worker executes consumes approximately `200Mi` memory, that means memory is a good metric for utilisation monitoring. -For a worker pod you can calculate it: `WORKER_CONCURRENCY * 200Mi`, so for `10 tasks` a worker will consume `~2Gi` of memory. -In the following config if a worker consumes `80%` of `2Gi` (which will happen if it runs 9-10 tasks at the same time), an autoscaling event will be triggered, and a new worker will be added. -If you have many tasks in a queue, Kubernetes will keep adding workers until maxReplicas reached, in this case `16`. -```yaml -airflow: - config: - AIRFLOW__CELERY__WORKER_CONCURRENCY: 10 - -workers: - # the initial/minimum number of workers - replicas: 2 - - resources: - requests: - memory: "2Gi" - - podDisruptionBudget: - enabled: true - ## prevents losing more than 20% of current worker task slots in a voluntary disruption - maxUnavailable: "20%" - - autoscaling: - enabled: true - maxReplicas: 16 - metrics: - - type: Resource - resource: - name: memory - target: - type: Utilization - averageUtilization: 80 - - celery: - ## wait at most 9min for running tasks to complete before SIGTERM - ## WARNING: - ## - some cloud cluster-autoscaler configs will not respect graceful termination - ## longer than 10min, for example, Google Kubernetes Engine (GKE) - gracefullTermination: true - gracefullTerminationPeriod: 540 - - ## how many seconds (after the 9min) to wait before SIGKILL - terminationPeriod: 60 - - logCleanup: - resources: - requests: - ## IMPORTANT! for autoscaling to work with logCleanup - memory: "64Mi" - -dags: - gitSync: - resources: - requests: - ## IMPORTANT! for autoscaling to work with gitSync - memory: "64Mi" -``` - -
-
- -### How to persist airflow logs? -
-Expand -
- -> 🟥 __Warning__ 🟥 -> -> For production, you should persist logs in a production deployment using one of these methods.
-> By default, logs are stored within the container's filesystem, therefore any restart of the pod will wipe your DAG logs. - -

Option 1a - PersistentVolumeClaim (chart-managed)

- -Example values to create a PVC with the cluster-default `storageClass` and 1Gi initial `size`: -```yaml -airflow: - defaultSecurityContext: - ## sets the filesystem owner group of files/folders in mounted volumes - ## this does NOT give root permissions to Pods, only the "root" group - fsGroup: 0 - -scheduler: - logCleanup: - ## scheduler log-cleanup must be disabled if `logs.persistence.enabled` is `true` - enabled: false - -workers: - logCleanup: - ## workers log-cleanup must be disabled if `logs.persistence.enabled` is `true` - enabled: false - -logs: - persistence: - enabled: true - storageClass: "" ## empty string means cluster-default - accessMode: ReadWriteMany - size: 1Gi -``` - -

Option 1b - PersistentVolumeClaim (existing / user-managed)

- -> 🟨 __Note__ 🟨 -> -> Your `logs.persistence.existingClaim` PVC must support `ReadWriteMany` for `accessMode` - -Example values to use an existing PVC called `my-logs-pvc`: - -```yaml -airflow: - defaultSecurityContext: - ## sets the filesystem owner group of files/folders in mounted volumes - ## this does NOT give root permissions to Pods, only the "root" group - fsGroup: 0 - -scheduler: - logCleanup: - ## scheduler log-cleanup must be disabled if `logs.persistence.enabled` is `true` - enabled: false - -workers: - logCleanup: - ## workers log-cleanup must be disabled if `logs.persistence.enabled` is `true` - enabled: false - -logs: - persistence: - enabled: true - existingClaim: my-logs-pvc - accessMode: ReadWriteMany -``` - -

Option 2a - Remote S3 Bucket (recommended on AWS)

- -Example values to use a remote S3 bucket for logging, with an `airflow.connection` called `my_aws` for authorization: -```yaml -airflow: - config: - AIRFLOW__LOGGING__REMOTE_LOGGING: "True" - AIRFLOW__LOGGING__REMOTE_BASE_LOG_FOLDER: "s3://<>/airflow/logs" - AIRFLOW__LOGGING__REMOTE_LOG_CONN_ID: "my_aws" - - connections: - ## see docs: https://airflow.apache.org/docs/apache-airflow-providers-amazon/stable/connections/aws.html - - id: my_aws - type: aws - description: my AWS connection - extra: |- - { "aws_access_key_id": "XXXXXXXX", - "aws_secret_access_key": "XXXXXXXX", - "region_name":"eu-central-1" } -``` - -Example values to use a remote S3 bucket for logging, with [EKS - IAM Roles for Service Accounts](https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts.html) for authorization: -```yaml -airflow: - config: - AIRFLOW__LOGGING__REMOTE_LOGGING: "True" - AIRFLOW__LOGGING__REMOTE_BASE_LOG_FOLDER: "s3://<>/airflow/logs" - AIRFLOW__LOGGING__REMOTE_LOG_CONN_ID: "aws_default" - -serviceAccount: - annotations: - eks.amazonaws.com/role-arn: "arn:aws:iam::XXXXXXXXXX:role/<>" -``` - -

Option 2b - Remote GCS Bucket (recommended on GCP)

- -Example values to use a remote GCS bucket for logging, with an `airflow.connection` called `my_gcp` for authorization: -```yaml -airflow: - config: - AIRFLOW__LOGGING__REMOTE_LOGGING: "True" - AIRFLOW__LOGGING__REMOTE_BASE_LOG_FOLDER: "gs://<>/airflow/logs" - AIRFLOW__LOGGING__REMOTE_LOG_CONN_ID: "my_gcp" - - connections: - ## see docs: https://airflow.apache.org/docs/apache-airflow-providers-google/stable/connections/gcp.html - - id: my_gcp - type: google_cloud_platform - description: my GCP connection - extra: |- - { "extra__google_cloud_platform__keyfile_dict": "XXXXXXXX", - "extra__google_cloud_platform__num_retries": "5" } -``` - -Example values to use a remote GCS bucket for logging, with [GKE - Workload Identity](https://cloud.google.com/kubernetes-engine/docs/how-to/workload-identity) for authorization: -```yaml -airflow: - config: - AIRFLOW__LOGGING__REMOTE_LOGGING: "True" - AIRFLOW__LOGGING__REMOTE_BASE_LOG_FOLDER: "gs://<>/airflow/logs" - AIRFLOW__LOGGING__REMOTE_LOG_CONN_ID: "google_cloud_default" - -serviceAccount: - annotations: - iam.gke.io/gcp-service-account: "<>@<>.iam.gserviceaccount.com" -``` - -
-
- -### How to configure the scheduler liveness probe? -
-Expand -
- -

Scheduler "Heartbeat Check"

- -The chart includes a [Kubernetes Liveness Probe](https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/) -for each airflow scheduler which regularly queries the Airflow Metadata Database to ensure the scheduler is ["healthy"](https://airflow.apache.org/docs/apache-airflow/stable/logging-monitoring/check-health.html). - -A scheduler is "healthy" if it has had a "heartbeat" in the last `AIRFLOW__SCHEDULER__SCHEDULER_HEALTH_CHECK_THRESHOLD` seconds. -Each scheduler will perform a "heartbeat" every `AIRFLOW__SCHEDULER__SCHEDULER_HEARTBEAT_SEC` seconds by updating the `latest_heartbeat` of its `SchedulerJob` in the Airflow Metadata `jobs` table. - -> 🟥 __Warning__ 🟥 -> -> A scheduler can have a "heartbeat" but be deadlocked such that it's unable to schedule new tasks, -> we provide the `scheduler.livenessProbe.taskCreationCheck.*` values to automatically restart the scheduler in these cases. -> -> https://github.com/apache/airflow/issues/7935 - patched in airflow `2.0.2`
-> https://github.com/apache/airflow/issues/15938 - patched in airflow `2.1.1` - -By default, the chart runs a liveness probe every __30 seconds__ (`periodSeconds`), and will restart a scheduler if __5 probe failures__ (`failureThreshold`) occur in a row. -This means a scheduler must be unhealthy for at least `30 x 5 = 150` seconds before Kubernetes will automatically restart a scheduler Pod. - -Here is an overview of the `scheduler.livenessProbe.*` values: - -```yaml -scheduler: - livenessProbe: - enabled: true - - ## number of seconds to wait after a scheduler container starts before running its first probe - ## NOTE: schedulers take a few seconds to actually start - initialDelaySeconds: 10 - - ## number of seconds to wait between each probe - periodSeconds: 30 - - ## maximum number of seconds that a probe can take before timing out - ## WARNING: if your database is very slow, you may need to increase this value to prevent invalid scheduler restarts - timeoutSeconds: 60 - - ## maximum number of consecutive probe failures, after which the scheduler will be restarted - ## NOTE: a "failure" could be any of: - ## 1. the probe takes more than `timeoutSeconds` - ## 2. the probe detects the scheduler as "unhealthy" - ## 3. the probe "task creation check" fails - failureThreshold: 5 -``` - -

Scheduler "Task Creation Check"

- -The liveness probe can additionally check if the Scheduler is creating new [tasks](https://airflow.apache.org/docs/apache-airflow/stable/concepts/tasks.html) as an indication of its health. -This check works by ensuring that the most recent `LocalTaskJob` had a `start_date` no more than `scheduler.livenessProbe.taskCreationCheck.thresholdSeconds` seconds ago. - -> 🟦 __Tip__ 🟦 -> -> The "Task Creation Check" is currently disabled by default, it can be enabled with `scheduler.livenessProbe.taskCreationCheck.enabled`. - -Here is an overview of the `scheduler.livenessProbe.taskCreationCheck.*` values: - -```yaml -scheduler: - livenessProbe: - enabled: true - ... - - taskCreationCheck: - ## if the task creation check is enabled - enabled: true - - ## the maximum number of seconds since the start_date of the most recent LocalTaskJob - ## WARNING: must be AT LEAST equal to your shortest DAG schedule_interval - ## WARNING: DummyOperator tasks will NOT be seen by this probe - thresholdSeconds: 300 -``` - -You might use the following `canary_dag` DAG definition to run a small task every __300 seconds__ (5 minutes): - -```python -from datetime import datetime, timedelta -from airflow import DAG - -# import using try/except to support both airflow 1 and 2 -try: - from airflow.operators.bash import BashOperator -except ModuleNotFoundError: - from airflow.operators.bash_operator import BashOperator - -dag = DAG( - dag_id="canary_dag", - default_args={ - "owner": "airflow", - }, - schedule_interval="*/5 * * * *", - start_date=datetime(2022, 1, 1), - dagrun_timeout=timedelta(minutes=5), - is_paused_upon_creation=False, - catchup=False, -) - -# WARNING: while `DummyOperator` would use less resources, the check can't see those tasks -# as they don't create LocalTaskJob instances -task = BashOperator( - task_id="canary_task", - bash_command="echo 'Hello World!'", - dag=dag, -) -``` - -
-
- -## FAQ - Databases - -> __Frequently asked questions related to database configs__ - -### How to use the embedded Postgres? -
-Expand -
- -> 🟥 __Warning__ 🟥 -> -> The embedded Postgres is NOT SUITABLE for production, you should follow [How to use an external database?](#how-to-use-an-external-database) - -> 🟨 __Note__ 🟨 -> -> If `pgbouncer.enabled=true` (the default), we will deploy [PgBouncer](https://www.pgbouncer.org/) to pool connections to your external database - -The embedded Postgres database has an insecure username/password by default, you should create secure credentials before using it. - -For example, to create the required Kubernetes Secrets: -```sh -# set postgress password -kubectl create secret generic \ - airflow-postgresql \ - --from-literal=postgresql-password=$(openssl rand -base64 13) \ - --namespace my-airflow-namespace - -# set redis password -kubectl create secret generic \ - airflow-redis \ - --from-literal=redis-password=$(openssl rand -base64 13) \ - --namespace my-airflow-namespace -``` - -Example values to use those secrets: -```yaml -postgresql: - existingSecret: airflow-postgresql - -redis: - existingSecret: airflow-redis -``` - -
-
- -### How to use an external database? -
-Expand -
- -> 🟥 __Warning__ 🟥 -> -> We __STRONGLY RECOMMEND__ that all production deployments of Airflow use an external database (not managed by this chart). - -When compared with the Postgres that is embedded in this chart, an external database comes with many benefits: - -1. The embedded Postgres version is usually very outdated, so is susceptible to critical security bugs -2. The embedded database may not scale to your performance requirements _(NOTE: every airflow task creates database connections)_ -3. An external database will likely achieve higher uptime _(NOTE: no airflow tasks will run if your database is down)_ -4. An external database can be configured with backups and disaster recovery - -Commonly, people use the managed PostgreSQL service from their cloud vendor to provision an external database: - -Cloud Platform | Service Name ---- | --- -Amazon Web Services | [Amazon RDS for PostgreSQL](https://aws.amazon.com/rds/postgresql/) -Microsoft Azure | [Azure Database for PostgreSQL](https://azure.microsoft.com/en-au/services/postgresql/) -Google Cloud | [Cloud SQL for PostgreSQL](https://cloud.google.com/sql/docs/postgres) -Alibaba Cloud | [ApsaraDB RDS for PostgreSQL](https://www.alibabacloud.com/product/apsaradb-for-rds-postgresql) -IBM Cloud | [IBM Cloud® Databases for PostgreSQL](https://cloud.ibm.com/docs/databases-for-postgresql) - -

Option 1 - Postgres

- -> 🟨 __Note__ 🟨 -> -> By default, this chart deploys [PgBouncer](https://www.pgbouncer.org/) to pool db connections and reduce the load from large numbers of airflow tasks. -> -> You may disable PgBouncer by setting `pgbouncer.enabled` to `false`. - -Example values for an external Postgres database, with an existing `airflow_cluster1` database: -```yaml -postgresql: - ## to use the external db, the embedded one must be disabled - enabled: false - -## for full list of PgBouncer configs, see values.yaml -pgbouncer: - enabled: true - - ## WARNING: you must set "scram-sha-256" if using Azure PostgreSQL (single server mode) - authType: md5 - - serverSSL: - ## WARNING: you must set "verify-ca" if using Azure PostgreSQL - mode: prefer - -externalDatabase: - type: postgres - - host: postgres.example.org - port: 5432 - - ## the schema which will contain the airflow tables - database: airflow_cluster1 - - ## (username - option 1) a plain-text helm value - user: my_airflow_user - - ## (username - option 2) a Kubernetes secret in your airflow namespace - #userSecret: "airflow-cluster1-database-credentials" - #userSecretKey: "username" - - ## (password - option 1) a plain-text helm value - password: my_airflow_password - - ## (password - option 2) a Kubernetes secret in your airflow namespace - #passwordSecret: "airflow-cluster1-database-credentials" - #passwordSecretKey: "password" - - ## use this for any extra connection-string settings, e.g. ?sslmode=disable - properties: "" -``` - -

Option 2 - MySQL

- -> 🟨 __Note__ 🟨 -> -> You must set `explicit_defaults_for_timestamp=1` in your MySQL instance, [see here](https://airflow.apache.org/docs/stable/howto/initialize-database.html) - -Example values for an external MySQL database, with an existing `airflow_cluster1` database: -```yaml -postgresql: - ## to use the external db, the embedded one must be disabled - enabled: false - -pgbouncer: - ## pgbouncer is automatically disabled if `externalDatabase.type` is `mysql` - #enabled: false - -externalDatabase: - type: mysql - - host: mysql.example.org - port: 3306 - - ## the database which will contain the airflow tables - database: airflow_cluster1 - - ## (username - option 1) a plain-text helm value - user: my_airflow_user - - ## (username - option 2) a Kubernetes secret in your airflow namespace - #userSecret: "airflow-cluster1-database-credentials" - #userSecretKey: "username" - - ## (password - option 1) a plain-text helm value - password: my_airflow_password - - ## (password - option 2) a Kubernetes secret in your airflow namespace - #passwordSecret: "airflow-cluster1-database-credentials" - #passwordSecretKey: "password" - - ## use this for any extra connection-string settings, e.g. ?useSSL=false - properties: "" -``` - -
-
- -### How to use an external redis? -
-Expand -
- -Example values for an external redis with ssl enabled: -```yaml -redis: - enabled: false - -externalRedis: - host: "example.redis.cache.windows.net" - port: 6380 - - ## the redis database-number that airflow will use - databaseNumber: 1 - - ## (option 1 - password) a plain-text helm value - password: my_airflow_password - - ## (option 2 - password) a Kubernetes secret in your airflow namespace - #passwordSecret: "airflow-cluster1-redis-credentials" - #passwordSecretKey: "password" - - ## use this for any extra connection-string settings - properties: "?ssl_cert_reqs=CERT_OPTIONAL" -``` - -
-
- -## FAQ - Kubernetes - -> __Frequently asked questions related to kubernetes configs__ - -### How to mount ConfigMaps/Secrets as environment variables? -
-Expand -
- -> 🟨 __Note__ 🟨 -> -> This method can be used to pass sensitive configs to Airflow - -You can use the `airflow.extraEnv` value to mount extra environment variables with the same structure as [EnvVar in ContainerSpec](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.20/#envvar-v1-core). - -Example values to use the `value` key from the existing Secret `airflow-fernet-key` to define `AIRFLOW__CORE__FERNET_KEY`: -```yaml -airflow: - extraEnv: - - name: AIRFLOW__CORE__FERNET_KEY - valueFrom: - secretKeyRef: - name: airflow-fernet-key - key: value -``` - -
-
- -### How to mount Secrets/Configmaps as files on workers? -
-Expand -
- -You can use the `workers.extraVolumeMounts` and `workers.extraVolumes` values to mount Secretes as files. - -For example, if the Secret `redshift-creds` already exist, and has keys called `user` and `password`: -```yaml -workers: - extraVolumeMounts: - - name: redshift-creds - mountPath: /opt/airflow/secrets/redshift-creds - readOnly: true - - extraVolumes: - - name: redshift-creds - secret: - secretName: redshift-creds -``` - -You could then read the `/opt/airflow/secrets/redshift-creds` files from within a DAG Python function: -```python -from pathlib import Path -redis_user = Path("/opt/airflow/secrets/redshift-creds/user").read_text().strip() -redis_password = Path("/opt/airflow/secrets/redshift-creds/password").read_text().strip() -``` - -To create the `redshift-creds` Secret, you could use: -```console -kubectl create secret generic \ - redshift-creds \ - --from-literal=user=MY_REDSHIFT_USERNAME \ - --from-literal=password=MY_REDSHIFT_PASSWORD \ - --namespace my-airflow-namespace -``` - -
-
- -### How to set up an Ingress? -
-Expand -
- -The chart provides the `ingress.*` values for deploying a Kubernetes Ingress to allow access to airflow outside the cluster. - -Consider the situation where you already have something hosted at the root of your domain, you might want to place airflow under a URL-prefix: -- http://example.com/airflow/ -- http://example.com/airflow/flower - -In this example, you would set these values, assuming you have an Ingress Controller with an IngressClass named "nginx" deployed: -```yaml -airflow: - config: - AIRFLOW__WEBSERVER__BASE_URL: "http://example.com/airflow/" - AIRFLOW__CELERY__FLOWER_URL_PREFIX: "/airflow/flower" - -ingress: - enabled: true - - ## WARNING: set as "networking.k8s.io/v1beta1" for Kubernetes 1.18 and earlier - apiVersion: networking.k8s.io/v1 - - ## airflow webserver ingress configs - web: - annotations: {} - host: "example.com" - path: "/airflow" - ## WARNING: requires Kubernetes 1.18 or later, use "kubernetes.io/ingress.class" annotation for older versions - ingressClassName: "nginx" - - ## flower ingress configs - flower: - annotations: {} - host: "example.com" - path: "/airflow/flower" - ## WARNING: requires Kubernetes 1.18 or later, use "kubernetes.io/ingress.class" annotation for older versions - ingressClassName: "nginx" -``` - -We expose the `ingress.web.precedingPaths` and `ingress.web.succeedingPaths` values, which are __before__ and __after__ the default path respectively. - -> 🟦 __Tip__ 🟦 -> -> A common use-case is [enabling SSL with the aws-alb-ingress-controller](https://kubernetes-sigs.github.io/aws-load-balancer-controller/v2.1/guide/tasks/ssl_redirect/), which needs a redirect path to be hit before the airflow-webserver one - -For example, setting `ingress.web.precedingPaths` for an aws-alb-ingress-controller with SSL: -```yaml -ingress: - web: - precedingPaths: - - path: "/*" - serviceName: "ssl-redirect" - servicePort: "use-annotation" -``` - -
-
- -### How to use Pod affinity, nodeSelector, and tolerations? -
-Expand -
- -If your environment needs to use Pod [affinity](https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#affinity-and-anti-affinity), [nodeSelector](https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#nodeselector), or [tolerations](https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/), we provide many values that allow fine-grained control over the Pod definitions. - -To set affinity, nodeSelector, and tolerations for all airflow Pods, you can use the `airflow.{defaultNodeSelector,defaultAffinity,defaultTolerations}` values: -```yaml -airflow: - ## https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#nodeselector - defaultNodeSelector: {} - # my_node_label_1: value1 - # my_node_label_2: value2 - - ## https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.20/#affinity-v1-core - defaultAffinity: {} - # podAffinity: - # requiredDuringSchedulingIgnoredDuringExecution: - # - labelSelector: - # matchExpressions: - # - key: security - # operator: In - # values: - # - S1 - # topologyKey: topology.kubernetes.io/zone - # podAntiAffinity: - # preferredDuringSchedulingIgnoredDuringExecution: - # - weight: 100 - # podAffinityTerm: - # labelSelector: - # matchExpressions: - # - key: security - # operator: In - # values: - # - S2 - # topologyKey: topology.kubernetes.io/zone - - ## https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.20/#toleration-v1-core - defaultTolerations: [] - # - key: "key1" - # operator: "Exists" - # effect: "NoSchedule" - # - key: "key2" - # operator: "Exists" - # effect: "NoSchedule" - -## if using the embedded postgres chart, you will also need to define these -postgresql: - master: - nodeSelector: {} - affinity: {} - tolerations: [] - -## if using the embedded redis chart, you will also need to define these -redis: - master: - nodeSelector: {} - affinity: {} - tolerations: [] -``` - -The `airflow.{defaultNodeSelector,defaultAffinity,defaultTolerations}` values are overridden by the per-resource values like `scheduler.{nodeSelector,affinity,tolerations}`: -```yaml -airflow: - ## airflow KubernetesExecutor pod_template - kubernetesPodTemplate: - nodeSelector: {} - affinity: {} - tolerations: [] - - ## sync deployments - sync: - nodeSelector: {} - affinity: {} - tolerations: [] - -## airflow schedulers -scheduler: - nodeSelector: {} - affinity: {} - tolerations: [] - -## airflow webserver -web: - nodeSelector: {} - affinity: {} - tolerations: [] - -## airflow workers -workers: - nodeSelector: {} - affinity: {} - tolerations: [] - -## airflow triggerer -triggerer: - nodeSelector: {} - affinity: {} - tolerations: [] - -## airflow workers -flower: - nodeSelector: {} - affinity: {} - tolerations: [] -``` - -
-
- -### How to integrate airflow with Prometheus? -
-Expand -
- -To be able to expose Airflow metrics to Prometheus you will need install a plugin, one option is [epoch8/airflow-exporter](https://github.com/epoch8/airflow-exporter) which exports DAG and task metrics from Airflow. - -A [ServiceMonitor](https://github.com/prometheus-operator/prometheus-operator/blob/master/Documentation/api.md#servicemonitor) is a resource introduced by the [Prometheus Operator](https://github.com/prometheus-operator/prometheus-operator), ror more information, see the `serviceMonitor` section of `values.yaml`. - -
-
- -### How to add extra manifests? -
-Expand -
- -You may use the `extraManifests` value to specify a list of extra Kubernetes manifests that will be deployed alongside the chart. - -> 🟦 __Tip__ 🟦 -> -> [Helm templates](https://helm.sh/docs/chart_template_guide/functions_and_pipelines/) within these strings will be rendered - -Example values to create a `Secret` for database credentials: _(__WARNING:__ store custom values securely if used)_ -```yaml -extraManifests: - - | - apiVersion: v1 - kind: Secret - metadata: - name: airflow-postgres-credentials - data: - postgresql-password: {{ `password1` | b64enc | quote }} -``` - -Example values to create a `Deployment` for a [busybox](https://busybox.net/) container: -```yaml -extraManifests: - - | - apiVersion: apps/v1 - kind: Deployment - metadata: - name: {{ include "airflow.fullname" . }}-busybox - labels: - app: {{ include "airflow.labels.app" . }} - component: busybox - chart: {{ include "airflow.labels.chart" . }} - release: {{ .Release.Name }} - heritage: {{ .Release.Service }} - spec: - replicas: 1 - selector: - matchLabels: - app: {{ include "airflow.labels.app" . }} - component: busybox - release: {{ .Release.Name }} - template: - metadata: - labels: - app: {{ include "airflow.labels.app" . }} - component: busybox - release: {{ .Release.Name }} - spec: - containers: - - name: busybox - image: busybox:1.35 - command: - - "/bin/sh" - - "-c" - args: - - | - ## to break the infinite loop when we receive SIGTERM - trap "exit 0" SIGTERM; - ## keep the container running (so people can `kubectl exec -it` into it) - while true; do - echo "I am alive..."; - sleep 30; - done -``` - -
-
- -## Values Reference - -> __Values provided by this chart (for more info see [values.yaml](values.yaml))__ - -### `airflow.*` -
-Expand -
+airflow.* Parameter | Description | Default --- | --- | --- @@ -1798,10 +167,8 @@ Parameter | Description | Default
-### `scheduler.*`
-Expand -
+scheduler.* Parameter | Description | Default --- | --- | --- @@ -1825,13 +192,10 @@ Parameter | Description | Default `scheduler.livenessProbe.*` | configs for the scheduler Pods' liveness probe | `` `scheduler.extraInitContainers` | extra init containers to run in the scheduler Pods | `[]` -
-### `web.*`
-Expand -
+web.* Parameter | Description | Default --- | --- | --- @@ -1855,13 +219,10 @@ Parameter | Description | Default `web.extraVolumeMounts` | extra VolumeMounts for the web Pods | `[]` `web.extraVolumes` | extra Volumes for the web Pods | `[]` -
-### `workers.*`
-Expand -
+workers.* Parameter | Description | Default --- | --- | --- @@ -1886,13 +247,10 @@ Parameter | Description | Default `workers.extraVolumeMounts` | extra VolumeMounts for the worker Pods | `[]` `workers.extraVolumes` | extra Volumes for the worker Pods | `[]` -
-### `triggerer.*`
-Expand -
+triggerer.* Parameter | Description | Default --- | --- | --- @@ -1915,13 +273,10 @@ Parameter | Description | Default `triggerer.extraVolumeMounts` | extra VolumeMounts for the triggerer Pods | `[]` `triggerer.extraVolumes` | extra Volumes for the triggerer Pods | `[]` -
-### `flower.*`
-Expand -
+flower.* Parameter | Description | Default --- | --- | --- @@ -1944,26 +299,20 @@ Parameter | Description | Default `flower.extraVolumeMounts` | extra VolumeMounts for the flower Pods | `[]` `flower.extraVolumes` | extra Volumes for the flower Pods | `[]` -
-### `logs.*`
-Expand -
+logs.* Parameter | Description | Default --- | --- | --- `logs.path` | the airflow logs folder | `/opt/airflow/logs` `logs.persistence.*` | configs for the logs PVC | `` -
-### `dags.*`
-Expand -
+dags.* Parameter | Description | Default --- | --- | --- @@ -1971,13 +320,10 @@ Parameter | Description | Default `dags.persistence.*` | configs for the dags PVC | `` `dags.gitSync.*` | configs for the git-sync sidecar | `` -
-### `ingress.*`
-Expand -
+ingress.* Parameter | Description | Default --- | --- | --- @@ -1986,26 +332,20 @@ Parameter | Description | Default `ingress.web.*` | configs for the Ingress of the web Service | `` `ingress.flower.*` | configs for the Ingress of the flower Service | `` -
-### `rbac.*`
-Expand -
+rbac.* Parameter | Description | Default --- | --- | --- `rbac.create` | if Kubernetes RBAC resources are created | `true` `rbac.events` | if the created RBAR role has GET/LIST access to Event resources | `false` -
-### `serviceAccount.*`
-Expand -
+serviceAccount.* Parameter | Description | Default --- | --- | --- @@ -2013,25 +353,19 @@ Parameter | Description | Default `serviceAccount.name` | the name of the ServiceAccount | `""` `serviceAccount.annotations` | annotations for the ServiceAccount | `{}` -
-### `extraManifests`
-Expand -
+extraManifests Parameter | Description | Default --- | --- | --- `extraManifests` | a list of extra Kubernetes manifests that will be deployed alongside the chart | `[]` -
-### `pgbouncer.*`
-Expand -
+pgbouncer.* Parameter | Description | Default --- | --- | --- @@ -2059,13 +393,10 @@ Parameter | Description | Default `pgbouncer.clientSSL.*` | ssl configs for: clients -> pgbouncer | `` `pgbouncer.serverSSL.*` | ssl configs for: pgbouncer -> postgres | `` -
-### `postgresql.*`
-Expand -
+postgresql.* Parameter | Description | Default --- | --- | --- @@ -2078,13 +409,10 @@ Parameter | Description | Default `postgresql.persistence.*` | configs for the PVC of postgresql | `` `postgresql.master.*` | configs for the postgres StatefulSet | `` -
-### `externalDatabase.*`
-Expand -
+externalDatabase.* Parameter | Description | Default --- | --- | --- @@ -2100,13 +428,10 @@ Parameter | Description | Default `externalDatabase.passwordSecretKey` | the key within `externalDatabase.passwordSecret` containing the password string | `postgresql-password` `externalDatabase.properties` | extra connection-string properties for the external database | `""` -
-### `redis.*`
-Expand -
+redis.* Parameter | Description | Default --- | --- | --- @@ -2118,13 +443,10 @@ Parameter | Description | Default `redis.master.*` | configs for the redis master StatefulSet | `` `redis.slave.*` | configs for the redis slave StatefulSet | `` -
-### `externalRedis.*`
-Expand -
+externalRedis.* Parameter | Description | Default --- | --- | --- @@ -2134,15 +456,12 @@ Parameter | Description | Default `externalRedis.password` | the password for the external redis | `""` `externalRedis.passwordSecret` | the name of a pre-created secret containing the external redis password | `""` `externalRedis.passwordSecretKey` | the key within `externalRedis.passwordSecret` containing the password string | `redis-password` -`externalDatabase.properties` | extra connection-string properties for the external redis | `""` +`externalDatabase.properties` | extra connection-string properties for the external redis | `""` -
-### `serviceMonitor.*`
-Expand -
+serviceMonitor.* Parameter | Description | Default --- | --- | --- @@ -2151,13 +470,10 @@ Parameter | Description | Default `serviceMonitor.path` | the ServiceMonitor web endpoint path | `/admin/metrics` `serviceMonitor.interval` | the ServiceMonitor web endpoint path | `30s` -
-### `prometheusRule.*`
-Expand -
+prometheusRule.* Parameter | Description | Default --- | --- | --- @@ -2165,5 +481,4 @@ Parameter | Description | Default `prometheusRule.additionalLabels` | labels for PrometheusRule, so that Prometheus can select it | `{}` `prometheusRule.groups` | alerting rules for Prometheus | `[]` -
-
+ \ No newline at end of file diff --git a/charts/airflow/UPGRADE.md b/charts/airflow/UPGRADE.md deleted file mode 100644 index 569d8210..00000000 --- a/charts/airflow/UPGRADE.md +++ /dev/null @@ -1 +0,0 @@ -This file has been replaced by [CHANGELOG.md](CHANGELOG.md) for versions `7.0.0` and later. \ No newline at end of file diff --git a/charts/airflow/docs/faq/configuration/airflow-configs.md b/charts/airflow/docs/faq/configuration/airflow-configs.md new file mode 100644 index 00000000..44fe8ce1 --- /dev/null +++ b/charts/airflow/docs/faq/configuration/airflow-configs.md @@ -0,0 +1,119 @@ +[🔗 Return to `Table of Contents` for more FAQ topics 🔗](https://github.com/airflow-helm/charts/tree/main/charts/airflow#frequently-asked-questions) + +> Note, this page was written for the [`User-Community Airflow Helm Chart`](https://github.com/airflow-helm/charts/tree/main/charts/airflow) + +# How to set airflow configs? + +## airflow.cfg + +While we don't expose the `airflow.cfg` file directly, you may use [environment variables](https://airflow.apache.org/docs/stable/howto/set-config.html) to set Airflow configs. + +The `airflow.config` value makes this easier, each key-value is mounted as an environment variable on each Pod: + +```yaml +airflow: + config: + ## security + AIRFLOW__WEBSERVER__EXPOSE_CONFIG: "False" + + ## dags + AIRFLOW__CORE__LOAD_EXAMPLES: "False" + AIRFLOW__SCHEDULER__DAG_DIR_LIST_INTERVAL: "30" + + ## email + AIRFLOW__EMAIL__EMAIL_BACKEND: "airflow.utils.email.send_email_smtp" + AIRFLOW__SMTP__SMTP_HOST: "smtpmail.example.com" + AIRFLOW__SMTP__SMTP_MAIL_FROM: "admin@example.com" + AIRFLOW__SMTP__SMTP_PORT: "25" + AIRFLOW__SMTP__SMTP_SSL: "False" + AIRFLOW__SMTP__SMTP_STARTTLS: "False" + + ## domain used in airflow emails + AIRFLOW__WEBSERVER__BASE_URL: "http://airflow.example.com" + + ## ether environment variables + HTTP_PROXY: "http://proxy.example.com:8080" +``` + +> 🟦 __Tip__ 🟦 +> +> To store sensitive configs in Kubernetes secrets, you may use the `airflow.extraEnv` value. +> +> For example, to set `AIRFLOW__CORE__FERNET_KEY` from a Secret called `airflow-fernet-key` containing a key called `value`: +> +> ```yaml +> airflow: +> extraEnv: +> - name: AIRFLOW__CORE__FERNET_KEY +> valueFrom: +> secretKeyRef: +> name: airflow-fernet-key +> key: value +> ``` + +## webserver_config.py + +We expose the `web.webserverConfig.*` values to define your Flask-AppBuilder `webserver_config.py` file. + +For example, a minimal `webserver_config.py` file that uses [`AUTH_DB`](https://flask-appbuilder.readthedocs.io/en/latest/security.html#authentication-database): + +```yaml +web: + webserverConfig: + ## the full content of the `webserver_config.py` file, as a string + stringOverride: | + from airflow import configuration as conf + from flask_appbuilder.security.manager import AUTH_DB + + # the SQLAlchemy connection string + SQLALCHEMY_DATABASE_URI = conf.get('core', 'SQL_ALCHEMY_CONN') + + # use embedded DB for auth + AUTH_TYPE = AUTH_DB + + ## the name of an existing Secret containing a `webserver_config.py` key + ## NOTE: if set, takes precedence over `web.webserverConfig.stringOverride` + #existingSecret: "my-airflow-webserver-config" +``` + +> 🟦 __Tip__ 🟦 +> +> We also provide more detailed docs on [how to integrate airflow with LDAP or OAUTH](../security/ldap-oauth.md). + +## airflow_local_settings.py + +We expose the `airflow.localSettings.*` values to define your `airflow_local_settings.py` file. + +For example, an `airflow_local_settings.py` file that sets a [cluster policy](https://airflow.apache.org/docs/apache-airflow/stable/concepts/cluster-policies.html) to reject dags with no tags: + +```yaml +airflow: + localSettings: + ## the full content of the `airflow_local_settings.py` file, as a string + stringOverride: | + from airflow.models import DAG + from airflow.exceptions import AirflowClusterPolicyViolation + + def dag_policy(dag: DAG): + """Ensure that DAG has at least one tag""" + if not dag.tags: + raise AirflowClusterPolicyViolation( + f"DAG {dag.dag_id} has no tags. At least one tag required. File path: {dag.fileloc}" + ) + + ## the name of an existing Secret containing a `airflow_local_settings.py` key + ## NOTE: if set, takes precedence over `airflow.localSettings.stringOverride` + #existingSecret: "my-airflow-local-settings" +``` + +For example, an `airflow_local_settings.py` file that sets the default KubernetesExecutor container image: + +```yaml +airflow: + localSettings: + ## the full content of the `airflow_local_settings.py` file, as a string + stringOverride: | + # use a custom `xcom_sidecar` image for KubernetesPodOperator() + from airflow.kubernetes.pod_generator import PodDefaults + PodDefaults.SIDECAR_CONTAINER.image = "gcr.io/PROJECT-ID/custom-sidecar-image" +``` \ No newline at end of file diff --git a/charts/airflow/docs/faq/configuration/airflow-plugins.md b/charts/airflow/docs/faq/configuration/airflow-plugins.md new file mode 100644 index 00000000..0f88eb78 --- /dev/null +++ b/charts/airflow/docs/faq/configuration/airflow-plugins.md @@ -0,0 +1,170 @@ +[🔗 Return to `Table of Contents` for more FAQ topics 🔗](https://github.com/airflow-helm/charts/tree/main/charts/airflow#frequently-asked-questions) + +> Note, this page was written for the [`User-Community Airflow Helm Chart`](https://github.com/airflow-helm/charts/tree/main/charts/airflow) + +# How to load airflow plugins? + +There are multiple ways to load [airflow plugins](https://airflow.apache.org/docs/apache-airflow/stable/plugins.html) when using the chart. + +## Option 1 - embedded into container image (recommended) + +This chart uses the official [apache/airflow](https://hub.docker.com/r/apache/airflow) images, you may extend the airflow container image with your airflow plugins. + +For example, here is a Dockerfile that extends `airflow:2.1.4-python3.8` with custom plugins: + +```dockerfile +FROM apache/airflow:2.1.4-python3.8 + +# plugin files can be copied under `/home/airflow/plugins` +# (where `./plugins` is relative to the docker build context) +COPY plugins/* /home/airflow/plugins/ + +# plugins exposed as python packages can be installed with pip +RUN pip install --no-cache-dir \ + example==1.0.0 +``` + +After building and tagging your Dockerfile as `MY_REPO:MY_TAG`, you may use it with the chart by specifying `airflow.image.*`: + +```yaml +airflow: + image: + repository: MY_REPO + tag: MY_TAG + + ## WARNING: even if set to "Always" do not reuse tag names, as containers only pull the latest image when restarting + pullPolicy: IfNotPresent +``` + +## Option 2 - git-sync dags repo + +> 🟥 __Warning__ 🟥 +> +> With "Option 2", you must manually restart the webserver and scheduler pods for plugin changes to take effect. + +If you are using git-sync to [load your DAG definitions](../dags/load-dag-definitions.md), you may also include your plugins in this repo. + +For example, if your DAG git repo includes plugins under `./PATH/TO/PLUGINS`: + +```yaml +airflow: + configs: + ## NOTE: there is an extra `/repo/` in the path + AIRFLOW__CORE__PLUGINS_FOLDER: /opt/airflow/dags/repo/PATH/TO/PLUGINS + +dags: + ## NOTE: this is the default value + #path: /opt/airflow/dags + + gitSync: + enabled: true + repo: "git@github.com:USERNAME/REPOSITORY.git" + branch: "master" + revision: "HEAD" + syncWait: 60 + sshSecret: "airflow-ssh-git-secret" + sshSecretKey: "id_rsa" + + # "known_hosts" verification can be disabled by setting to "" + sshKnownHosts: |- + github.com ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAq2A7hRGmdnm9tUDbO9IDSwBK6TbQa+PXYPCPy6rbTrTtw7PHkccKrpp0yVhp5HdEIcKr6pLlVDBfOLX9QUsyCOV0wzfjIJNlGEYsdlLJizHhbn2mUjvSAHQqZETYP81eFzLQNnPHt4EVVUh7VfDESU84KezmD5QlWpXLmvU31/yMf+Se8xhHTvKSCZIFImWwoG6mbUoWf9nzpIoaSjB+weqqUUmpaaasXVal72J+UX2B+2RPW3RcT0eOzQgqlJL3RKrTJvdsjE3JEAvGq3lGHSZXy28G3skua2SmVi/w4yCE6gbODqnTWlg7+wC604ydGXA8VJiS5ap43JXiUFFAaQ== +``` + +## Option 3 - persistent volume + +> 🟥 __Warning__ 🟥 +> +> With "Option 3", you must manually restart the webserver and scheduler pods for plugin changes to take effect. + +You may load airflow plugins that are stored in a Kubernetes [Persistent Volume](https://kubernetes.io/docs/concepts/storage/persistent-volumes/) by using the `airflow.extraVolumeMounts` and `airflow.extraVolumes` values. + +For example, to mount a PersistentVolumeClaim called `airflow-plugins` that contains airflow plugin files at its root: + +```yaml +airflow: + configs: + ## NOTE: this is the default value + #AIRFLOW__CORE__PLUGINS_FOLDER: /opt/airflow/plugins + + extraVolumeMounts: + - name: airflow-plugins + mountPath: /opt/airflow/plugins + ## NOTE: if plugin files are not at the root of the volume, you may set a subPath + #subPath: "path/to/plugins" + readOnly: true + + extraVolumes: + - name: airflow-plugins + persistentVolumeClaim: + claimName: airflow-plugins +``` + +## Option 4 - ConfigMaps or Secrets + +> 🟥 __Warning__ 🟥 +> +> With "Option 4", you must manually restart the webserver and scheduler pods for plugin changes to take effect. + +You may load airflow plugins that are sored in Kubernetes Secrets or ConfigMaps by using the `airflow.extraVolumeMounts` and `airflow.extraVolumes` values. + +For example, to mount airflow plugin files from a ConfigMap called `airflow-plugins`: + +```yaml +workers: + configs: + ## NOTE: this is the default value + #AIRFLOW__CORE__PLUGINS_FOLDER: /opt/airflow/plugins + + extraVolumeMounts: + - name: airflow-plugins + mountPath: /opt/airflow/plugins + readOnly: true + + extraVolumes: + - name: airflow-plugins + configMap: + name: airflow-plugins +``` + +> 🟦 __Tip__ 🟦 +> +> Your `airflow-plugins` ConfigMap might look something like this. +> +> ```yaml +> apiVersion: v1 +> kind: ConfigMap +> metadata: +> name: airflow-plugins +> data: +> my_airflow_plugin.py: | +> from airflow.plugins_manager import AirflowPlugin +> +> class MyAirflowPlugin(AirflowPlugin): +> name = "my_airflow_plugin" +> ... +> ``` + +> 🟦 __Tip__ 🟦 +> +> You may include the ConfigMap as an [extra manifest](../kubernetes/extra-manifests.md) of the chart using the `extraManifests` value. +> +> ```yaml +> extraManifests: +> - | +> apiVersion: v1 +> kind: ConfigMap +> metadata: +> name: airflow-plugins +> labels: +> app: {{ include "airflow.labels.app" . }} +> chart: {{ include "airflow.labels.chart" . }} +> release: {{ .Release.Name }} +> heritage: {{ .Release.Service }} +> data: +> my_airflow_plugin.py: | +> from airflow.plugins_manager import AirflowPlugin +> +> class MyAirflowPlugin(AirflowPlugin): +> name = "my_airflow_plugin" +> ... +> ``` diff --git a/charts/airflow/docs/faq/configuration/airflow-version.md b/charts/airflow/docs/faq/configuration/airflow-version.md new file mode 100644 index 00000000..ac85f347 --- /dev/null +++ b/charts/airflow/docs/faq/configuration/airflow-version.md @@ -0,0 +1,61 @@ +[🔗 Return to `Table of Contents` for more FAQ topics 🔗](https://github.com/airflow-helm/charts/tree/main/charts/airflow#frequently-asked-questions) + +> Note, this page was written for the [`User-Community Airflow Helm Chart`](https://github.com/airflow-helm/charts/tree/main/charts/airflow) + +# How to set the airflow version? + +> 🟦 __Tip__ 🟦 +> +> There is a default version (`airflow.image.tag`) of airflow shipped with each version of the chart, see the default [values.yaml](../../../values.yaml) for the current one. + +> 🟦 __Tip__ 🟦 +> +> Many versions of airflow versions are supported by the chart, please see the [Airflow Version Support](../../..#airflow-version-support) matrix. + +## Airflow 2.X + +For example, to use airflow `2.1.4`, with python `3.7`: + +```yaml +airflow: + image: + repository: apache/airflow + tag: 2.1.4-python3.7 +``` + +## Airflow 1.10 + +> 🟥 __Warning__ 🟥 +> +> To use an `airflow.image.tag` with Airflow `1.10+`, you must set `airflow.legacyCommands` to `true`. + +For example, to use airflow `1.10.15`, with python `3.8`: + +```yaml +airflow: + # WARNING: this must be "true" for airflow 1.10 + legacyCommands: true + + image: + repository: apache/airflow + tag: 1.10.15-python3.8 +``` + +## Building a Custom Image + +Airflow provides documentation on [building custom docker images](https://airflow.apache.org/docs/docker-stack/build.html), you may follow this process to create a custom image. + +For example, after building and tagging your Dockerfile as `MY_REPO:MY_TAG`, you may use it with the chart by specifying `airflow.image.*`: + +```yaml +airflow: + # WARNING: this must be "true" for airflow 1.10 + #legacyCommands: true + + image: + repository: MY_REPO + tag: MY_TAG + + ## WARNING: even if set to "Always" do not reuse tag names, as containers only pull the latest image when restarting + pullPolicy: IfNotPresent +``` diff --git a/charts/airflow/docs/faq/configuration/autoscaling-celery-workers.md b/charts/airflow/docs/faq/configuration/autoscaling-celery-workers.md new file mode 100644 index 00000000..aa18e93d --- /dev/null +++ b/charts/airflow/docs/faq/configuration/autoscaling-celery-workers.md @@ -0,0 +1,76 @@ +[🔗 Return to `Table of Contents` for more FAQ topics 🔗](https://github.com/airflow-helm/charts/tree/main/charts/airflow#frequently-asked-questions) + +> Note, this page was written for the [`User-Community Airflow Helm Chart`](https://github.com/airflow-helm/charts/tree/main/charts/airflow) + +# How to configure autoscaling for celery workers? + +> 🟨 __Note__ 🟨 +> +> This method of autoscaling is not ideal. +> There is not necessarily a link between RAM usage, and the number of pending tasks, +> meaning you could have a situation where your workers don't scale up despite having pending tasks. +> +> We are planning to implement an airflow-task aware autoscaler in a future chart release. + +The Airflow Celery Workers can be scaled using the [Horizontal Pod Autoscaler](https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/), +to enable autoscaling, you must set `workers.autoscaling.enabled=true`, then provide `workers.autoscaling.maxReplicas`. + +Assume every task a worker executes consumes approximately `200Mi` memory, that means memory is a good metric for utilisation monitoring. +For a worker pod you can calculate it: `WORKER_CONCURRENCY * 200Mi`, so for `10 tasks` a worker will consume `~2Gi` of memory. +In the following config if a worker consumes `80%` of `2Gi` (which will happen if it runs 9-10 tasks at the same time), +an autoscaling event will be triggered, and a new worker will be added. +If you have many tasks in a queue, Kubernetes will keep adding workers until maxReplicas reached, in this case `16`. + +```yaml +airflow: + config: + AIRFLOW__CELERY__WORKER_CONCURRENCY: 10 + +workers: + # the initial/minimum number of workers + replicas: 2 + + resources: + requests: + memory: "2Gi" + + podDisruptionBudget: + enabled: true + ## prevents losing more than 20% of current worker task slots in a voluntary disruption + maxUnavailable: "20%" + + autoscaling: + enabled: true + maxReplicas: 16 + metrics: + - type: Resource + resource: + name: memory + target: + type: Utilization + averageUtilization: 80 + + celery: + ## wait at most 9min for running tasks to complete before SIGTERM + ## WARNING: + ## - some cloud cluster-autoscaler configs will not respect graceful termination + ## longer than 10min, for example, Google Kubernetes Engine (GKE) + gracefullTermination: true + gracefullTerminationPeriod: 540 + + ## how many seconds (after the 9min) to wait before SIGKILL + terminationPeriod: 60 + + logCleanup: + resources: + requests: + ## IMPORTANT! for autoscaling to work with logCleanup + memory: "64Mi" + +dags: + gitSync: + resources: + requests: + ## IMPORTANT! for autoscaling to work with gitSync + memory: "64Mi" +``` \ No newline at end of file diff --git a/charts/airflow/docs/faq/configuration/extra-python-packages.md b/charts/airflow/docs/faq/configuration/extra-python-packages.md new file mode 100644 index 00000000..a474f27e --- /dev/null +++ b/charts/airflow/docs/faq/configuration/extra-python-packages.md @@ -0,0 +1,120 @@ +[🔗 Return to `Table of Contents` for more FAQ topics 🔗](https://github.com/airflow-helm/charts/tree/main/charts/airflow#frequently-asked-questions) + +> Note, this page was written for the [`User-Community Airflow Helm Chart`](https://github.com/airflow-helm/charts/tree/main/charts/airflow) + +# How to install extra python packages? + +## Option 1 - use init-containers + +> 🟥 __Warning__ 🟥 +> +> We __strongly advice__ that you DO NOT USE "Option 1" in production, as PyPI packages may change unexpectedly between container restarts. + +### Install on all Airflow Pods + +You may use the `airflow.extraPipPackages` value to install pip packages on all airflow Pods. + +For example, to install PyTorch on all scheduler/web/worker/flower Pods: + +```yaml +airflow: + extraPipPackages: + - "airflow-exporter~=1.4.1" +``` + +### Install on Scheduler only + +You may use the `scheduler.extraPipPackages` value to install pip packages on the airflow scheduler Pods. + +For example, to install PyTorch on the scheduler Pods only: + +```yaml +scheduler: + extraPipPackages: + - "torch~=1.8.0" +``` + +> 🟦 __Tip__ 🟦 +> +> If a package is defined in both `airflow.extraPipPackages` and `scheduler.extraPipPackages`, the version in the latter will take precedence. +> +> This is because we list packages from deployment-specific values at the end of the `pip install ...` command. + +### Install on Worker only + +You may use the `worker.extraPipPackages` value to install pip packages on the airflow worker Pods. + +For example, to install PyTorch on the worker Pods only: + +```yaml +worker: + extraPipPackages: + - "torch~=1.8.0" +``` + +### Install on Flower only + +You may use the `flower.extraPipPackages` value to install pip packages on the flower Pods. + +For example, to install PyTorch on the flower Pods only: + +```yaml +flower: + extraPipPackages: + - "torch~=1.8.0" +``` + +### Install from Private pip index + +Pip can install packages from a private Python Package Index using the `--index-url` argument or `PIP_INDEX_URL` environment variable. + +For example, to install `my-internal-package` from a private index hosted at `example.com/packages/simple/`: + +```yaml +airflow: + config: + ## pip configs can be set with environment variables + PIP_TIMEOUT: 60 + PIP_INDEX_URL: https://:@example.com/packages/simple/ + PIP_TRUSTED_HOST: example.com + + extraPipPackages: + - "my-internal-package==1.0.0" +``` + +## Option 2 - embedded into container image (recommended) + +This chart uses the official [apache/airflow](https://hub.docker.com/r/apache/airflow) images, you may extend the airflow container image with your pip packages. + +For example, here is a Dockerfile that extends `airflow:2.1.4-python3.8` with the `torch` package: + +```dockerfile +FROM apache/airflow:2.1.4-python3.8 + +# install your pip packages +RUN pip install --no-cache-dir \ + torch~=1.8.0 +``` + +After building and tagging your Dockerfile as `MY_REPO:MY_TAG`, you may use it with the chart by specifying `airflow.image.*`: + +```yaml +airflow: + image: + repository: MY_REPO + tag: MY_TAG + + ## WARNING: even if set to "Always" do not reuse tag names, as containers only pull the latest image when restarting + pullPolicy: IfNotPresent +``` + +> 🟥 __Warning__ 🟥 +> +> Ensure that you never reuse an image tag name. +> This ensures that whenever you update `airflow.image.tag`, all airflow pods will restart with the latest pip-packages. +> +> For example, you may append a version or git hash corresponding to your pip-packages: +> +> 1. `MY_REPO:MY_TAG-v1`, `MY_REPO:MY_TAG-v2`, `MY_REPO:MY_TAG-v3` +> 2. `MY_REPO:MY_TAG-0.1.0`, `MY_REPO:MY_TAG-0.1.1`, `MY_REPO:MY_TAG-0.1.3` +> 3. `MY_REPO:MY_TAG-a1a1a1a`, `MY_REPO:MY_TAG-a2a2a3a`, `MY_REPO:MY_TAG-a3a3a3a` diff --git a/charts/airflow/docs/faq/dags/airflow-connections.md b/charts/airflow/docs/faq/dags/airflow-connections.md new file mode 100644 index 00000000..7a0efac7 --- /dev/null +++ b/charts/airflow/docs/faq/dags/airflow-connections.md @@ -0,0 +1,98 @@ +[🔗 Return to `Table of Contents` for more FAQ topics 🔗](https://github.com/airflow-helm/charts/tree/main/charts/airflow#frequently-asked-questions) + +> Note, this page was written for the [`User-Community Airflow Helm Chart`](https://github.com/airflow-helm/charts/tree/main/charts/airflow) + +# How to manage airflow connections? + +## Define with Plain-Text + +You may use the `airflow.connections` value to create airflow [Connections](https://airflow.apache.org/docs/apache-airflow/stable/concepts.html#connections) in a declarative way. + +For example, to create connections called `my_aws`, `my_gcp`, `my_postgres`, and `my_ssh`: + +```yaml +airflow: + connections: + ## see docs: https://airflow.apache.org/docs/apache-airflow-providers-amazon/stable/connections/aws.html + - id: my_aws + type: aws + description: my AWS connection + extra: |- + { "aws_access_key_id": "XXXXXXXX", + "aws_secret_access_key": "XXXXXXXX", + "region_name":"eu-central-1" } + ## see docs: https://airflow.apache.org/docs/apache-airflow-providers-google/stable/connections/gcp.html + - id: my_gcp + type: google_cloud_platform + description: my GCP connection + extra: |- + { "extra__google_cloud_platform__keyfile_dict": "XXXXXXXX", + "extra__google_cloud_platform__num_retries: "XXXXXXXX" } + ## see docs: https://airflow.apache.org/docs/apache-airflow-providers-postgres/stable/connections/postgres.html + - id: my_postgres + type: postgres + description: my Postgres connection + host: postgres.example.com + port: 5432 + login: db_user + password: db_pass + schema: my_db + extra: |- + { "sslmode": "allow" } + ## see docs: https://airflow.apache.org/docs/apache-airflow-providers-ssh/stable/connections/ssh.html + - id: my_ssh + type: ssh + description: my SSH connection + host: ssh.example.com + port: 22 + login: ssh_user + password: ssh_pass + extra: |- + { "timeout": "15" } + + ## if we create a Deployment to perpetually sync `airflow.connections` + connectionsUpdate: true +``` + +## Define with templates from Secrets or ConfigMaps + +You may use `airflow.connectionsTemplates` to extract string templates from keys in Secrets or Configmaps. + +For example, to use templates from `Secret/my-secret` and `ConfigMap/my-configmap` in parts of the `my_aws` connection: + +```yaml +airflow: + connections: + - id: my_aws + type: aws + description: my AWS connection + + ## use the AWS_ACCESS_KEY_ID and AWS_ACCESS_KEY templates that are defined in `airflow.connectionsTemplates` + extra: |- + { "aws_access_key_id": "${AWS_ACCESS_KEY_ID}", + "aws_secret_access_key": "${AWS_ACCESS_KEY}", + "region_name":"eu-central-1" } + + ## bash-like templates to be used in `airflow.connections` + connectionsTemplates: + + ## define the `AWS_ACCESS_KEY_ID` template from the `my-configmap` ConfigMap + AWS_ACCESS_KEY_ID: + kind: configmap + name: my-configmap + key: username + + ## define the `AWS_ACCESS_KEY` template from the `my-secret` Secret + AWS_ACCESS_KEY: + kind: secret + name: my-secret + key: password + + ## if we create a Deployment to perpetually sync `airflow.connections` + connectionsUpdate: true +``` + +> 🟨 __Note__ 🟨 +> +> If `airflow.connectionsUpdate = true`, the connections which use `airflow.connectionsTemplates` will be updated in real-time, +> allowing tools like [external-secrets](https://github.com/external-secrets/kubernetes-external-secrets) to be used. diff --git a/charts/airflow/docs/faq/dags/airflow-pools.md b/charts/airflow/docs/faq/dags/airflow-pools.md new file mode 100644 index 00000000..2ed7243a --- /dev/null +++ b/charts/airflow/docs/faq/dags/airflow-pools.md @@ -0,0 +1,23 @@ +[🔗 Return to `Table of Contents` for more FAQ topics 🔗](https://github.com/airflow-helm/charts/tree/main/charts/airflow#frequently-asked-questions) + +> Note, this page was written for the [`User-Community Airflow Helm Chart`](https://github.com/airflow-helm/charts/tree/main/charts/airflow) + +# How to manage airflow pools? + +You may use the `airflow.pools` value to create airflow [Pools](https://airflow.apache.org/docs/apache-airflow/stable/concepts.html#pools) in a declarative way. + +Example values to create pools called `pool_1`, `pool_2`: + +```yaml +airflow: + pools: + - name: "pool_1" + description: "example pool with 5 slots" + slots: 5 + - name: "pool_2" + description: "example pool with 10 slots" + slots: 10 + + ## if we create a Deployment to perpetually sync `airflow.pools` + poolsUpdate: true +``` \ No newline at end of file diff --git a/charts/airflow/docs/faq/dags/airflow-variables.md b/charts/airflow/docs/faq/dags/airflow-variables.md new file mode 100644 index 00000000..48b42c44 --- /dev/null +++ b/charts/airflow/docs/faq/dags/airflow-variables.md @@ -0,0 +1,62 @@ +[🔗 Return to `Table of Contents` for more FAQ topics 🔗](https://github.com/airflow-helm/charts/tree/main/charts/airflow#frequently-asked-questions) + +> Note, this page was written for the [`User-Community Airflow Helm Chart`](https://github.com/airflow-helm/charts/tree/main/charts/airflow) + +# How to manage airflow variables? + +## Define with Plain-Text + +You may use the `airflow.variables` value to create airflow [Variables](https://airflow.apache.org/docs/apache-airflow/stable/concepts.html#variables) in a declarative way. + +For example, to create variables called `var_1`, `var_2`: + +```yaml +airflow: + variables: + - key: "var_1" + value: "my_value_1" + - key: "var_2" + value: "my_value_2" + + ## if we create a Deployment to perpetually sync `airflow.variables` + variablesUpdate: true +``` + +## Define with templates from Secrets or ConfigMaps + +You may use `airflow.variablesTemplates` to extract string templates from keys in Secrets or Configmaps. + +For example, to use templates from `Secret/my-secret` and `ConfigMap/my-configmap` in the `var_1` and `var_2` variables: + +```yaml +airflow: + ## use the MY_VALUE_1 and MY_VALUE_2 templates that are defined in `airflow.variablesTemplates` + variables: + - key: "var_1" + value: "${MY_VALUE_1}" + - key: "var_2" + value: "${MY_VALUE_2}" + + ## bash-like templates to be used in `airflow.variables` + variablesTemplates: + + ## define the `MY_VALUE_1` template from the `my-configmap` ConfigMap + MY_VALUE_1: + kind: configmap + name: my-configmap + key: value1 + + ## define the `MY_VALUE_2` template from the `my-secret` Secret + MY_VALUE_2: + kind: secret + name: my-secret + key: value2 + + ## if we create a Deployment to perpetually sync `airflow.variables` + variablesUpdate: false +``` + +> 🟨 __Note__ 🟨 +> +> If `airflow.variablesTemplates = true`, the connections which use `airflow.variablesTemplates` will be updated in real-time, +> allowing tools like [external-secrets](https://github.com/external-secrets/kubernetes-external-secrets) to be used. \ No newline at end of file diff --git a/charts/airflow/docs/faq/dags/load-dag-definitions.md b/charts/airflow/docs/faq/dags/load-dag-definitions.md new file mode 100644 index 00000000..01422b9b --- /dev/null +++ b/charts/airflow/docs/faq/dags/load-dag-definitions.md @@ -0,0 +1,178 @@ +[🔗 Return to `Table of Contents` for more FAQ topics 🔗](https://github.com/airflow-helm/charts/tree/main/charts/airflow#frequently-asked-questions) + +> Note, this page was written for the [`User-Community Airflow Helm Chart`](https://github.com/airflow-helm/charts/tree/main/charts/airflow) + +# How to load DAG definitions? + +## Option 1 - git-sync sidecar + +### SSH git auth + +This method uses an SSH git-sync sidecar to sync your git repo into the dag folder every `dags.gitSync.syncWait` seconds. + +Example values defining an SSH git repo: + +```yaml +airflow: + config: + ## NOTE: this is set to `dags.gitSync.syncWait` by default + #AIRFLOW__SCHEDULER__DAG_DIR_LIST_INTERVAL: 60 + +dags: + ## NOTE: this is the default value + #path: /opt/airflow/dags + + gitSync: + enabled: true + repo: "git@github.com:USERNAME/REPOSITORY.git" + branch: "master" + revision: "HEAD" + syncWait: 60 + sshSecret: "airflow-ssh-git-secret" + sshSecretKey: "id_rsa" + + ## NOTE: "known_hosts" verification can be disabled by setting to "" + sshKnownHosts: |- + github.com ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAq2A7hRGmdnm9tUDbO9IDSwBK6TbQa+PXYPCPy6rbTrTtw7PHkccKrpp0yVhp5HdEIcKr6pLlVDBfOLX9QUsyCOV0wzfjIJNlGEYsdlLJizHhbn2mUjvSAHQqZETYP81eFzLQNnPHt4EVVUh7VfDESU84KezmD5QlWpXLmvU31/yMf+Se8xhHTvKSCZIFImWwoG6mbUoWf9nzpIoaSjB+weqqUUmpaaasXVal72J+UX2B+2RPW3RcT0eOzQgqlJL3RKrTJvdsjE3JEAvGq3lGHSZXy28G3skua2SmVi/w4yCE6gbODqnTWlg7+wC604ydGXA8VJiS5ap43JXiUFFAaQ== +``` + +> 🟦 __Tip__ 🟦 +> +> You may create the `airflow-ssh-git-secret` Secret using: +> +> ```shell +> kubectl create secret generic \ +> airflow-ssh-git-secret \ +> --from-file=id_rsa=$HOME/.ssh/id_rsa \ +> --namespace my-airflow-namespace +> ``` + +### HTTP git auth + +This method uses an HTTP git sidecar to sync your git repo into the dag folder every `dags.gitSync.syncWait` seconds. + +Example values defining an HTTP git repo: + +```yaml +airflow: + config: + ## NOTE: this is set to `dags.gitSync.syncWait` by default + #AIRFLOW__SCHEDULER__DAG_DIR_LIST_INTERVAL: 60 + +dags: + ## NOTE: this is the default value + #path: /opt/airflow/dags + + gitSync: + enabled: true + repo: "https://github.com/USERNAME/REPOSITORY.git" + branch: "master" + revision: "HEAD" + syncWait: 60 + httpSecret: "airflow-http-git-secret" + httpSecretUsernameKey: username + httpSecretPasswordKey: password +``` + +> 🟦 __Tip__ 🟦 +> +> You may create `airflow-http-git-secret` Secret using: +> +> ```shell +> kubectl create secret generic \ +> airflow-http-git-secret \ +> --from-literal=username=MY_GIT_USERNAME \ +> --from-literal=password=MY_GIT_TOKEN \ +> --namespace my-airflow-namespace +> ``` + +## Option 2 - persistent volume + +With this method, you store your DAGs in a Kubernetes PersistentVolume, which is mounted to all scheduler/web/worker Pods. + +> 🟦 __Tip__ 🟦 +> +> You must configure some external system to ensure the persistent volume has your latest DAGs. +> +> For example, you could use your CI/CD pipeline system to preform a sync as changes are pushed to your DAGs git repo. + +### Chart Managed Volume + +For example, to have the chart create a PVC with the `storageClass` called `default` and an initial `size` of `1Gi`: + +```yaml +dags: + ## NOTE: this is the default value + #path: /opt/airflow/dags + + persistence: + enabled: true + + ## configs for the chart-managed volume + storageClass: "default" # NOTE: "" means cluster-default + size: 1Gi + + accessMode: ReadOnlyMany +``` + +> 🟦 __Tip__ 🟦 +> +> The name of the chart-managed volume will be `{{ .Release.Name | trunc 63 | trimSuffix "-" | trunc 58 }}-dags`. + +### User Managed Volume + +For example, to use an existing PVC called `my-dags-pvc`: + +```yaml +dags: + ## NOTE: this is the default value + #path: /opt/airflow/dags + + persistence: + enabled: true + + ## the name of your existing volume + existingClaim: my-dags-pvc + + accessMode: ReadOnlyMany +``` + +> 🟦 __Tip__ 🟦 +> +> Your `dags.persistence.existingClaim` PVC must support `ReadOnlyMany` or `ReadWriteMany` for `accessMode` + +## Option 3 - embedded into container image + +This chart uses the official [apache/airflow](https://hub.docker.com/r/apache/airflow) images, you may extend the airflow container image with your DAG definition files. + +Example extending `airflow:2.0.1-python3.8` with some dags: + +```dockerfile +FROM apache/airflow:2.0.1-python3.8 + +# NOTE: dag path is set with the `dags.path` value +COPY ./my_dag_folder /opt/airflow/dags +``` + +Example values to use `MY_REPO:MY_TAG` container image with the chart: + +```yaml +airflow: + image: + repository: MY_REPO + tag: MY_TAG + + ## WARNING: even if set to "Always" do not reuse tag names, as containers only pull the latest image when restarting + pullPolicy: IfNotPresent +``` + +> 🟥 __Warning__ 🟥 +> +> Ensure that you never reuse an image tag name. +> This ensures that whenever you update `airflow.image.tag`, all airflow pods will restart with the latest DAGs. +> +> For example, you may append a version or git hash corresponding to your DAGs: +> +> 1. `MY_REPO:MY_TAG-v1`, `MY_REPO:MY_TAG-v2`, `MY_REPO:MY_TAG-v3` +> 2. `MY_REPO:MY_TAG-0.1.0`, `MY_REPO:MY_TAG-0.1.1`, `MY_REPO:MY_TAG-0.1.3` +> 3. `MY_REPO:MY_TAG-a1a1a1a`, `MY_REPO:MY_TAG-a2a2a3a`, `MY_REPO:MY_TAG-a3a3a3a` \ No newline at end of file diff --git a/charts/airflow/docs/faq/database/embedded-database.md b/charts/airflow/docs/faq/database/embedded-database.md new file mode 100644 index 00000000..15c76958 --- /dev/null +++ b/charts/airflow/docs/faq/database/embedded-database.md @@ -0,0 +1,33 @@ +[🔗 Return to `Table of Contents` for more FAQ topics 🔗](https://github.com/airflow-helm/charts/tree/main/charts/airflow#frequently-asked-questions) + +> Note, this page was written for the [`User-Community Airflow Helm Chart`](https://github.com/airflow-helm/charts/tree/main/charts/airflow) + +# How to configure the embedded database? + +> 🟥 __Warning__ 🟥 +> +> The embedded database is NOT SUITABLE for production, we strongly recommend using an [external database](external-database.md) instead! + +## Set a Custom Password + +The embedded PostgreSQL database has an insecure password of `airflow` by default which is set by the `postgresql.postgresqlPassword` value. +To improve database security, you should generate a custom password and store it in a Kubernetes secret using `postgresql.existingSecret`. + +For example, to use a pre-created Secret called `airflow-postgresql` that contains a key called `postgresql-password`: + +```yaml +postgresql: + existingSecret: airflow-postgresql + existingSecretKey: postgresql-password +``` + +> 🟦 __Tip__ 🟦 +> +> You may use `kubectl` to create the `airflow-postgresql` Secret with a random `postgresql-password` key. +> +> ```shell +> kubectl create secret generic \ +> airflow-postgresql \ +> --from-literal=postgresql-password=$(openssl rand -base64 13) \ +> --namespace my-airflow-namespace +> ``` \ No newline at end of file diff --git a/charts/airflow/docs/faq/database/embedded-redis.md b/charts/airflow/docs/faq/database/embedded-redis.md new file mode 100644 index 00000000..0fbeb8a9 --- /dev/null +++ b/charts/airflow/docs/faq/database/embedded-redis.md @@ -0,0 +1,33 @@ +[🔗 Return to `Table of Contents` for more FAQ topics 🔗](https://github.com/airflow-helm/charts/tree/main/charts/airflow#frequently-asked-questions) + +> Note, this page was written for the [`User-Community Airflow Helm Chart`](https://github.com/airflow-helm/charts/tree/main/charts/airflow) + +# How to configure the embedded redis? + +> 🟦 __Tip__ 🟦 +> +> You may consider using an [external redis](external-redis.md) rather than the embedded one. + +## Set a Custom Password + +The embedded Redis has an insecure password of `airflow` by default which is set by the `redis.password` value. +To improve security, you should generate a custom password and store it in a Kubernetes secret using `redis.existingSecret`. + +For example, to use a pre-created Secret called `airflow-redis` that contains a key called `redis-password`: + +```yaml +redis: + existingSecret: airflow-redis + existingSecretKey: redis-password +``` + +> 🟦 __Tip__ 🟦 +> +> You may use `kubectl` to create the `airflow-redis` Secret with a random `redis-password` key. +> +> ```shell +> kubectl create secret generic \ +> airflow-redis \ +> --from-literal=redis-password=$(openssl rand -base64 13) \ +> --namespace my-airflow-namespace +> ``` \ No newline at end of file diff --git a/charts/airflow/docs/faq/database/external-database.md b/charts/airflow/docs/faq/database/external-database.md new file mode 100644 index 00000000..664b83ad --- /dev/null +++ b/charts/airflow/docs/faq/database/external-database.md @@ -0,0 +1,126 @@ +[🔗 Return to `Table of Contents` for more FAQ topics 🔗](https://github.com/airflow-helm/charts/tree/main/charts/airflow#frequently-asked-questions) + +> Note, this page was written for the [`User-Community Airflow Helm Chart`](https://github.com/airflow-helm/charts/tree/main/charts/airflow) + +# How to configure an external database? + +> 🟥 __Warning__ 🟥 +> +> We __STRONGLY RECOMMEND__ that all production deployments of Airflow use an external database, not the [embedded database](embedded-database.md). + +> 🟦 __Tip__ 🟦 +> +> When compared with the Postgres that is embedded in this chart, an __external database__ comes with many benefits: +> +> 1. The embedded Postgres version is usually very outdated, so is susceptible to critical security bugs +> 2. The embedded database may not scale to your performance requirements +> 3. An external database will likely achieve higher uptime +> 4. An external database can be configured with backups and disaster recovery +> +> Commonly, people use the managed PostgreSQL service from their cloud vendor to provision an external database: +> +> Cloud Platform | Service Name +> --- | --- +> Amazon Web Services | [Amazon RDS for PostgreSQL](https://aws.amazon.com/rds/postgresql/) +> Microsoft Azure | [Azure Database for PostgreSQL](https://azure.microsoft.com/en-au/services/postgresql/) +> Google Cloud | [Cloud SQL for PostgreSQL](https://cloud.google.com/sql/docs/postgres) +> Alibaba Cloud | [ApsaraDB RDS for PostgreSQL](https://www.alibabacloud.com/product/apsaradb-for-rds-postgresql) +> IBM Cloud | [IBM Cloud® Databases for PostgreSQL](https://cloud.ibm.com/docs/databases-for-postgresql) + +## Option 1 - Postgres + +> 🟨 __Note__ 🟨 +> +> By default, this chart deploys [PgBouncer](https://www.pgbouncer.org/) to pool db connections and reduce the load from large numbers of airflow tasks. +> +> You may read more about [how to configure the chart's PgBouncer](pgbouncer.md). + +For example, to use an external Postgres at `postgres.example.org`, with an existing `airflow_cluster1` database: + +```yaml +postgresql: + ## to use the external db, the embedded one must be disabled + enabled: false + +## for full list of PgBouncer configs, see values.yaml +pgbouncer: + enabled: true + + ## WARNING: for PostgreSQL with password_encryption = 'SCRAM-SHA-256', the following non-default value is needed + # authType: scram-sha-256 + + ## WARNING: for "Azure PostgreSQL", the following non-default values are needed + # authType: scram-sha-256 + # serverSSL: + # mode: verify-ca + +externalDatabase: + type: postgres + + host: postgres.example.org + port: 5432 + + ## the schema which will contain the airflow tables + database: airflow_cluster1 + + ## (username - option 1) a plain-text helm value + user: my_airflow_user + + ## (username - option 2) a Kubernetes secret in your airflow namespace + #userSecret: "airflow-cluster1-database-credentials" + #userSecretKey: "username" + + ## (password - option 1) a plain-text helm value + password: my_airflow_password + + ## (password - option 2) a Kubernetes secret in your airflow namespace + #passwordSecret: "airflow-cluster1-database-credentials" + #passwordSecretKey: "password" + + ## use this for any extra connection-string settings, e.g. ?sslmode=disable + properties: "" +``` + +## Option 2 - MySQL + +> 🟥 __Warning__ 🟥 +> +> You must set `explicit_defaults_for_timestamp=1` in your MySQL instance, [see here](https://airflow.apache.org/docs/stable/howto/initialize-database.html) + +For example, to use an external MySQL at `mysql.example.org`, with an existing `airflow_cluster1` database: + +```yaml +postgresql: + ## to use the external db, the embedded one must be disabled + enabled: false + +pgbouncer: + ## pgbouncer is automatically disabled if `externalDatabase.type` is `mysql` + #enabled: false + +externalDatabase: + type: mysql + + host: mysql.example.org + port: 3306 + + ## the database which will contain the airflow tables + database: airflow_cluster1 + + ## (username - option 1) a plain-text helm value + user: my_airflow_user + + ## (username - option 2) a Kubernetes secret in your airflow namespace + #userSecret: "airflow-cluster1-database-credentials" + #userSecretKey: "username" + + ## (password - option 1) a plain-text helm value + password: my_airflow_password + + ## (password - option 2) a Kubernetes secret in your airflow namespace + #passwordSecret: "airflow-cluster1-database-credentials" + #passwordSecretKey: "password" + + ## use this for any extra connection-string settings, e.g. ?useSSL=false + properties: "" +``` \ No newline at end of file diff --git a/charts/airflow/docs/faq/database/external-redis.md b/charts/airflow/docs/faq/database/external-redis.md new file mode 100644 index 00000000..00b12206 --- /dev/null +++ b/charts/airflow/docs/faq/database/external-redis.md @@ -0,0 +1,29 @@ +[🔗 Return to `Table of Contents` for more FAQ topics 🔗](https://github.com/airflow-helm/charts/tree/main/charts/airflow#frequently-asked-questions) + +> Note, this page was written for the [`User-Community Airflow Helm Chart`](https://github.com/airflow-helm/charts/tree/main/charts/airflow) + +# How to configure an external redis? + +For example, to use an external redis at `example.redis.cache.windows.net` with ssl enabled: + +```yaml +redis: + enabled: false + +externalRedis: + host: "example.redis.cache.windows.net" + port: 6380 + + ## the redis database-number that airflow will use + databaseNumber: 1 + + ## (option 1 - password) a plain-text helm value + password: my_airflow_password + + ## (option 2 - password) a Kubernetes secret in your airflow namespace + #passwordSecret: "airflow-cluster1-redis-credentials" + #passwordSecretKey: "password" + + ## use this for any extra connection-string settings + properties: "?ssl_cert_reqs=CERT_OPTIONAL" +``` \ No newline at end of file diff --git a/charts/airflow/docs/faq/database/pgbouncer.md b/charts/airflow/docs/faq/database/pgbouncer.md new file mode 100644 index 00000000..12784fda --- /dev/null +++ b/charts/airflow/docs/faq/database/pgbouncer.md @@ -0,0 +1,105 @@ +[🔗 Return to `Table of Contents` for more FAQ topics 🔗](https://github.com/airflow-helm/charts/tree/main/charts/airflow#frequently-asked-questions) + +> Note, this page was written for the [`User-Community Airflow Helm Chart`](https://github.com/airflow-helm/charts/tree/main/charts/airflow) + +# How to configure pgbouncer? + +By default, this chart deploys [PgBouncer](https://www.pgbouncer.org/) to pool db connections and reduce the load from large numbers of airflow tasks. + +## PgBouncer Configs + +> 🟥 __Warning__ 🟥 +> +> If using an external Postgres that has [`password_encryption = 'SCRAM-SHA-256'`](https://www.postgresql.org/docs/current/runtime-config-connection.html#GUC-PASSWORD-ENCRYPTION), you must configure PgBouncer with `auth_type = scram-sha-256`. +> +> ```yaml +> pgbouncer: +> authType: scram-sha-256 +> ``` + +> 🟥 __Warning__ 🟥 +> +> If using [Azure Database for PostgreSQL](https://azure.microsoft.com/en-au/services/postgresql/), you must configure PgBouncer with `auth_type = scram-sha-256` and `server_tls_sslmode = verify-ca`. +> +> ```yaml +> pgbouncer: +> authType: scram-sha-256 +> +> serverSSL: +> mode: verify-ca +> ``` + +We expose a number of PgBouncer's configs as values under `pgbouncer.*`: + +```yaml +pgbouncer: + ## if the pgbouncer Deployment is created + enabled: true + + ## sets pgbouncer config: `auth_type` + authType: md5 + + ## sets pgbouncer config: `max_client_conn` + maxClientConnections: 1000 + + ## sets pgbouncer config: `default_pool_size` + poolSize: 20 + + ## sets pgbouncer config: `log_disconnections` + logDisconnections: 0 + + ## sets pgbouncer config: `log_connections` + logConnections: 0 + + ## ssl configs for: clients -> pgbouncer + ## + clientSSL: + ## sets pgbouncer config: `client_tls_sslmode` + mode: prefer + + ## sets pgbouncer config: `client_tls_ciphers` + ciphers: normal + + ## sets pgbouncer config: `client_tls_ca_file` + caFile: + existingSecret: "" + existingSecretKey: root.crt + + ## sets pgbouncer config: `client_tls_key_file` + ## WARNING: a self-signed cert & key are generated if left empty + keyFile: + existingSecret: "" + existingSecretKey: client.key + + ## sets pgbouncer config: `client_tls_cert_file` + ## WARNING: a self-signed cert & key are generated if left empty + certFile: + existingSecret: "" + existingSecretKey: client.crt + + ## ssl configs for: pgbouncer -> postgres + ## + serverSSL: + ## sets pgbouncer config: `server_tls_sslmode` + mode: prefer + + ## sets pgbouncer config: `server_tls_ciphers` + ciphers: normal + + ## sets pgbouncer config: `server_tls_ca_file` + caFile: + existingSecret: "" + existingSecretKey: root.crt + + ## sets pgbouncer config: `server_tls_key_file` + keyFile: + existingSecret: "" + existingSecretKey: server.key + + ## sets pgbouncer config: `server_tls_cert_file` + certFile: + existingSecret: "" + existingSecretKey: server.crt + +``` + diff --git a/charts/airflow/docs/faq/kubernetes/affinity-node-selectors-tolerations.md b/charts/airflow/docs/faq/kubernetes/affinity-node-selectors-tolerations.md new file mode 100644 index 00000000..21689435 --- /dev/null +++ b/charts/airflow/docs/faq/kubernetes/affinity-node-selectors-tolerations.md @@ -0,0 +1,121 @@ +[🔗 Return to `Table of Contents` for more FAQ topics 🔗](https://github.com/airflow-helm/charts/tree/main/charts/airflow#frequently-asked-questions) + +> Note, this page was written for the [`User-Community Airflow Helm Chart`](https://github.com/airflow-helm/charts/tree/main/charts/airflow) + +# How to configure pod affinity, nodeSelector, and tolerations? + +If your environment needs to use Pod [affinity](https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#affinity-and-anti-affinity), +[nodeSelector](https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#nodeselector), +or [tolerations](https://kubernetes.io/docs/concepts/scheduling-eviction/taint-and-toleration/), +we provide many values that allow fine-grained control over the Pod definitions. + +## Global Configs + +To set affinity, nodeSelector, and tolerations for all airflow Pods, you may use the `airflow.{defaultNodeSelector,defaultAffinity,defaultTolerations}` values: + +```yaml +airflow: + ## https://kubernetes.io/docs/concepts/scheduling-eviction/assign-pod-node/#nodeselector + defaultNodeSelector: {} + # my_node_label_1: value1 + # my_node_label_2: value2 + + ## https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.20/#affinity-v1-core + defaultAffinity: {} + # podAffinity: + # requiredDuringSchedulingIgnoredDuringExecution: + # - labelSelector: + # matchExpressions: + # - key: security + # operator: In + # values: + # - S1 + # topologyKey: topology.kubernetes.io/zone + # podAntiAffinity: + # preferredDuringSchedulingIgnoredDuringExecution: + # - weight: 100 + # podAffinityTerm: + # labelSelector: + # matchExpressions: + # - key: security + # operator: In + # values: + # - S2 + # topologyKey: topology.kubernetes.io/zone + + ## https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.20/#toleration-v1-core + defaultTolerations: [] + # - key: "key1" + # operator: "Exists" + # effect: "NoSchedule" + # - key: "key2" + # operator: "Exists" + # effect: "NoSchedule" + +## if using the embedded postgres chart, you will also need to define these +postgresql: + master: + nodeSelector: {} + affinity: {} + tolerations: [] + +## if using the embedded redis chart, you will also need to define these +redis: + master: + nodeSelector: {} + affinity: {} + tolerations: [] +``` + +## Per-Resource Configs + +To set affinity, nodeSelector, and tolerations for specific pods, you may use the following values: + +```yaml +airflow: + ## airflow KubernetesExecutor pod_template + kubernetesPodTemplate: + nodeSelector: {} + affinity: {} + tolerations: [] + + ## sync deployments + sync: + nodeSelector: {} + affinity: {} + tolerations: [] + +## airflow schedulers +scheduler: + nodeSelector: {} + affinity: {} + tolerations: [] + +## airflow webserver +web: + nodeSelector: {} + affinity: {} + tolerations: [] + +## airflow workers +workers: + nodeSelector: {} + affinity: {} + tolerations: [] + +## airflow triggerer +triggerer: + nodeSelector: {} + affinity: {} + tolerations: [] + +## airflow workers +flower: + nodeSelector: {} + affinity: {} + tolerations: [] +``` + +> 🟦 __Tip__ 🟦 +> +> The `airflow.{defaultNodeSelector,defaultAffinity,defaultTolerations}` values are overridden by the per-resource values like `scheduler.{nodeSelector,affinity,tolerations}`. diff --git a/charts/airflow/docs/faq/kubernetes/extra-manifests.md b/charts/airflow/docs/faq/kubernetes/extra-manifests.md new file mode 100644 index 00000000..f0407a42 --- /dev/null +++ b/charts/airflow/docs/faq/kubernetes/extra-manifests.md @@ -0,0 +1,70 @@ +[🔗 Return to `Table of Contents` for more FAQ topics 🔗](https://github.com/airflow-helm/charts/tree/main/charts/airflow#frequently-asked-questions) + +> Note, this page was written for the [`User-Community Airflow Helm Chart`](https://github.com/airflow-helm/charts/tree/main/charts/airflow) + +# How to add extra kubernetes manifests? + +You may use the `extraManifests` value to specify a list of extra Kubernetes manifests that will be deployed alongside the chart. + +> 🟦 __Tip__ 🟦 +> +> [Helm templates](https://helm.sh/docs/chart_template_guide/functions_and_pipelines/) within these strings will be rendered + +Example values to create a `Secret` for database credentials: _(__WARNING:__ store custom values securely if used)_ + +```yaml +extraManifests: + - | + apiVersion: v1 + kind: Secret + metadata: + name: airflow-postgres-credentials + data: + postgresql-password: {{ `password1` | b64enc | quote }} +``` + +Example values to create a `Deployment` for a [busybox](https://busybox.net/) container: + +```yaml +extraManifests: + - | + apiVersion: apps/v1 + kind: Deployment + metadata: + name: {{ include "airflow.fullname" . }}-busybox + labels: + app: {{ include "airflow.labels.app" . }} + component: busybox + chart: {{ include "airflow.labels.chart" . }} + release: {{ .Release.Name }} + heritage: {{ .Release.Service }} + spec: + replicas: 1 + selector: + matchLabels: + app: {{ include "airflow.labels.app" . }} + component: busybox + release: {{ .Release.Name }} + template: + metadata: + labels: + app: {{ include "airflow.labels.app" . }} + component: busybox + release: {{ .Release.Name }} + spec: + containers: + - name: busybox + image: busybox:1.35 + command: + - "/bin/sh" + - "-c" + args: + - | + ## to break the infinite loop when we receive SIGTERM + trap "exit 0" SIGTERM; + ## keep the container running (so people can `kubectl exec -it` into it) + while true; do + echo "I am alive..."; + sleep 30; + done +``` \ No newline at end of file diff --git a/charts/airflow/docs/faq/kubernetes/ingress.md b/charts/airflow/docs/faq/kubernetes/ingress.md new file mode 100644 index 00000000..56a7503a --- /dev/null +++ b/charts/airflow/docs/faq/kubernetes/ingress.md @@ -0,0 +1,62 @@ +[🔗 Return to `Table of Contents` for more FAQ topics 🔗](https://github.com/airflow-helm/charts/tree/main/charts/airflow#frequently-asked-questions) + +> Note, this page was written for the [`User-Community Airflow Helm Chart`](https://github.com/airflow-helm/charts/tree/main/charts/airflow) + +# How to set up a kubernetes ingress? + +The chart provides the `ingress.*` values for deploying a Kubernetes Ingress to allow access to airflow outside the cluster. + +Consider the situation where you already have something hosted at the root of your domain, you might want to place airflow under a URL-prefix: +- http://example.com/airflow/ +- http://example.com/airflow/flower + +For example, assuming you have an Ingress Controller with an IngressClass named "nginx" deployed: + +```yaml +airflow: + config: + AIRFLOW__WEBSERVER__BASE_URL: "http://example.com/airflow/" + AIRFLOW__CELERY__FLOWER_URL_PREFIX: "/airflow/flower" + +ingress: + enabled: true + + ## WARNING: set as "networking.k8s.io/v1beta1" for Kubernetes 1.18 and earlier + apiVersion: networking.k8s.io/v1 + + ## airflow webserver ingress configs + web: + annotations: {} + host: "example.com" + path: "/airflow" + ## WARNING: requires Kubernetes 1.18 or later, use "kubernetes.io/ingress.class" annotation for older versions + ingressClassName: "nginx" + + ## flower ingress configs + flower: + annotations: {} + host: "example.com" + path: "/airflow/flower" + ## WARNING: requires Kubernetes 1.18 or later, use "kubernetes.io/ingress.class" annotation for older versions + ingressClassName: "nginx" +``` + +## Preceding and Succeeding Paths + +We expose the `ingress.web.precedingPaths` and `ingress.web.succeedingPaths` values, which are __before__ and __after__ the default path respectively. + +> 🟦 __Tip__ 🟦 +> +> A common use-case is [enabling SSL with the aws-alb-ingress-controller](https://kubernetes-sigs.github.io/aws-load-balancer-controller/v2.1/guide/tasks/ssl_redirect/), +> which needs a redirect path to be hit before the airflow-webserver one. + +For example, setting `ingress.web.precedingPaths` for an aws-alb-ingress-controller with SSL: + +```yaml +ingress: + web: + precedingPaths: + - path: "/*" + serviceName: "ssl-redirect" + servicePort: "use-annotation" +``` \ No newline at end of file diff --git a/charts/airflow/docs/faq/kubernetes/mount-environment-variables.md b/charts/airflow/docs/faq/kubernetes/mount-environment-variables.md new file mode 100644 index 00000000..20c2dfcd --- /dev/null +++ b/charts/airflow/docs/faq/kubernetes/mount-environment-variables.md @@ -0,0 +1,23 @@ +[🔗 Return to `Table of Contents` for more FAQ topics 🔗](https://github.com/airflow-helm/charts/tree/main/charts/airflow#frequently-asked-questions) + +> Note, this page was written for the [`User-Community Airflow Helm Chart`](https://github.com/airflow-helm/charts/tree/main/charts/airflow) + +# How to mount ConfigMaps and Secrets as environment variables? + +You may use the `airflow.extraEnv` value to mount extra environment variables with the same structure as [EnvVar in ContainerSpec](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.20/#envvar-v1-core). + +> 🟦 __Tip__ 🟦 +> +> This method can be used to pass sensitive [airflow configs](../configuration/airflow-configs.md). + +For example, to use the `value` key from the existing Secret `airflow-fernet-key` to define `AIRFLOW__CORE__FERNET_KEY`: + +```yaml +airflow: + extraEnv: + - name: AIRFLOW__CORE__FERNET_KEY + valueFrom: + secretKeyRef: + name: airflow-fernet-key + key: value +``` \ No newline at end of file diff --git a/charts/airflow/docs/faq/kubernetes/mount-files.md b/charts/airflow/docs/faq/kubernetes/mount-files.md new file mode 100644 index 00000000..a3523961 --- /dev/null +++ b/charts/airflow/docs/faq/kubernetes/mount-files.md @@ -0,0 +1,66 @@ +[🔗 Return to `Table of Contents` for more FAQ topics 🔗](https://github.com/airflow-helm/charts/tree/main/charts/airflow#frequently-asked-questions) + +> Note, this page was written for the [`User-Community Airflow Helm Chart`](https://github.com/airflow-helm/charts/tree/main/charts/airflow) + +# How to mount ConfigMaps and Secrets as files? + +## Mount on CeleryExecutor Workers + +You may use the `workers.extraVolumeMounts` and `workers.extraVolumes` values to mount ConfigMaps/Secrets as files on the airflow CeleryExecutor worker pods. + +For example, to mount a Secret called `redshift-creds` at the `/opt/airflow/secrets/redshift-creds` directory of all CeleryExecutor worker pods: + +```yaml +workers: + extraVolumeMounts: + - name: redshift-creds + mountPath: /opt/airflow/secrets/redshift-creds + readOnly: true + + extraVolumes: + - name: redshift-creds + secret: + secretName: redshift-creds +``` + +> 🟦 __Tip__ 🟦 +> +> You may create the `redshift-creds` Secret with `kubectl`. +> +> ```shell +> kubectl create secret generic \ +> redshift-creds \ +> --from-literal=user=MY_REDSHIFT_USERNAME \ +> --from-literal=password=MY_REDSHIFT_PASSWORD \ +> --namespace my-airflow-namespace +> ``` + +> 🟦 __Tip__ 🟦 +> +> You may read the `/opt/airflow/secrets/redshift-creds` files from within an airflow [PythonOperator](https://airflow.apache.org/docs/apache-airflow/stable/howto/operator/python.html). +> +> ```python +> from pathlib import Path +> redis_user = Path("/opt/airflow/secrets/redshift-creds/user").read_text().strip() +> redis_password = Path("/opt/airflow/secrets/redshift-creds/password").read_text().strip() +> ``` + +## Mount on KubernetesExecutor Pod Template + +You may use the `airflow.kubernetesPodTemplate.extraVolumeMounts` and `airflow.kubernetesPodTemplate.extraVolumes` values to mount ConfigMaps/Secrets as files on the airflow KubernetesExecutor pod template. + +For example, to mount a Secret called `redshift-creds` at the `/opt/airflow/secrets/redshift-creds` directory of all KubernetesExecutor pod templates: + +```yaml +airflow: + kubernetesPodTemplate: + extraVolumeMounts: + - name: redshift-creds + mountPath: /opt/airflow/secrets/redshift-creds + readOnly: true + + extraVolumes: + - name: redshift-creds + secret: + secretName: redshift-creds +``` \ No newline at end of file diff --git a/charts/airflow/docs/faq/kubernetes/mount-persistent-volumes.md b/charts/airflow/docs/faq/kubernetes/mount-persistent-volumes.md new file mode 100644 index 00000000..ed5e7f6c --- /dev/null +++ b/charts/airflow/docs/faq/kubernetes/mount-persistent-volumes.md @@ -0,0 +1,44 @@ +[🔗 Return to `Table of Contents` for more FAQ topics 🔗](https://github.com/airflow-helm/charts/tree/main/charts/airflow#frequently-asked-questions) + +> Note, this page was written for the [`User-Community Airflow Helm Chart`](https://github.com/airflow-helm/charts/tree/main/charts/airflow) + +# How to mount extra persistent volumes? + +## Mount on CeleryExecutor Workers + +You may use the `workers.extraVolumeMounts` and `workers.extraVolumes` values to mount persistent volumes on the airflow CeleryExecutor worker pods. + +For example, to mount a Volume called `worker-tmp` at the `/tmp` directory of all CeleryExecutor worker pods: + +```yaml +workers: + extraVolumeMounts: + - name: worker-tmp + mountPath: /tmp + readOnly: false + + extraVolumes: + - name: worker-tmp + persistentVolumeClaim: + claimName: worker-tmp +``` + +## Mount on KubernetesExecutor Pod Template + +You may use the `airflow.kubernetesPodTemplate.extraVolumeMounts` and `airflow.kubernetesPodTemplate.extraVolumes` values to mount persistent volumes on the airflow KubernetesExecutor pod template. + +For example, to mount a Volume called `worker-tmp` at the `/tmp` directory of all KubernetesExecutor pod templates: + +```yaml +airflow: + kubernetesPodTemplate: + extraVolumeMounts: + - name: worker-tmp + mountPath: /tmp + readOnly: false + + extraVolumes: + - name: worker-tmp + persistentVolumeClaim: + claimName: worker-tmp +``` \ No newline at end of file diff --git a/charts/airflow/docs/faq/monitoring/log-cleanup.md b/charts/airflow/docs/faq/monitoring/log-cleanup.md new file mode 100644 index 00000000..2cf38a6b --- /dev/null +++ b/charts/airflow/docs/faq/monitoring/log-cleanup.md @@ -0,0 +1,52 @@ +[🔗 Return to `Table of Contents` for more FAQ topics 🔗](https://github.com/airflow-helm/charts/tree/main/charts/airflow#frequently-asked-questions) + +> Note, this page was written for the [`User-Community Airflow Helm Chart`](https://github.com/airflow-helm/charts/tree/main/charts/airflow) + +# How to automatically clean up airflow logs? + +> 🟥 __Warning__ 🟥 +> +> If `logs.persistence.enabled` is `true`, then `scheduler.logCleanup.enabled` and `workers.logCleanup.enabled` must be `false`. +> +> This is to prevent multiple log-cleanup sidecars attempting to delete the same logs files at the same time. +> We are planning to implement a central log-cleanup deployment in a future release that will work with log persistence. + +## Scheduler + +By default, this chart deploys each airflow scheduler Pod with a sidecar that deletes log files last-modified more than `scheduler.logCleanup.retentionMinutes` minutes ago. +This helps prevent excessive log buildup within the Pod's filesystem. + +You may disable or configure the log-cleanup sidecar with the `scheduler.logCleanup.*` values: + +```yaml +scheduler: + logCleanup: + ## WARNING: must be disabled if `logs.persistence.enabled` is `true` + enabled: true + + ## the number of minutes to retain log files (by last-modified time) + retentionMinutes: 21600 + + ## the number of seconds between each check for files to delete + intervalSeconds: 900 +``` + +## CeleryExecutor Workers + +By default, this chart deploys each airflow CeleryExecutor worker Pod with a sidecar that deletes log files last-modified more than `workers.logCleanup.retentionMinutes` minutes ago. +This helps prevent excessive log buildup within the Pod's filesystem. + +You may disable or configure the log-cleanup sidecar with the `workers.logCleanup.*` values: + +```yaml +workers: + logCleanup: + ## WARNING: must be disabled if `logs.persistence.enabled` is `true` + enabled: true + + ## the number of minutes to retain log files (by last-modified time) + retentionMinutes: 21600 + + ## the number of seconds between each check for files to delete + intervalSeconds: 900 +``` \ No newline at end of file diff --git a/charts/airflow/docs/faq/monitoring/log-persistence.md b/charts/airflow/docs/faq/monitoring/log-persistence.md new file mode 100644 index 00000000..196f3406 --- /dev/null +++ b/charts/airflow/docs/faq/monitoring/log-persistence.md @@ -0,0 +1,149 @@ +[🔗 Return to `Table of Contents` for more FAQ topics 🔗](https://github.com/airflow-helm/charts/tree/main/charts/airflow#frequently-asked-questions) + +> Note, this page was written for the [`User-Community Airflow Helm Chart`](https://github.com/airflow-helm/charts/tree/main/charts/airflow) + +# How to persist airflow logs? + +> 🟥 __Warning__ 🟥 +> +> For production, you should persist logs in a production deployment using one of these methods. +> By default, logs are stored within the container's filesystem, therefore any restart of the pod will wipe your DAG logs. + +## Option 1 - persistent volume + +### Chart Managed Volume + +For example, to have the chart create a PVC with the `storageClass` called `default` and an initial `size` of `1Gi`: + +```yaml +scheduler: + logCleanup: + ## WARNING: scheduler log-cleanup must be disabled if `logs.persistence.enabled` is `true` + enabled: false + +workers: + logCleanup: + ## WARNING: workers log-cleanup must be disabled if `logs.persistence.enabled` is `true` + enabled: false + +logs: + ## NOTE: this is the default value + #path: /opt/airflow/logs + + persistence: + enabled: true + + ## configs for the chart-managed volume + storageClass: "default" # NOTE: "" means cluster-default + size: 1Gi + accessMode: ReadWriteMany +``` + +> 🟦 __Tip__ 🟦 +> +> The name of the chart-managed volume will be `{{ .Release.Name | trunc 63 | trimSuffix "-" | trunc 58 }}-logs`. + +### User Managed Volume + +For example, to use an existing PVC called `my-logs-pvc`: + +```yaml +scheduler: + logCleanup: + ## WARNING: scheduler log-cleanup must be disabled if `logs.persistence.enabled` is `true` + enabled: false + +workers: + logCleanup: + ## WARNING: workers log-cleanup must be disabled if `logs.persistence.enabled` is `true` + enabled: false + +logs: + ## NOTE: this is the default value + #path: /opt/airflow/logs + + persistence: + enabled: true + + ## the name of your existing volume + existingClaim: my-logs-pvc + + accessMode: ReadWriteMany +``` + +> 🟦 __Tip__ 🟦 +> +> Your `logs.persistence.existingClaim` PVC must support `ReadWriteMany` for `accessMode`. + +## Option 2 - remote cloud bucket + +### S3 Bucket (recommended on AWS) + +For example, to use a remote S3 bucket for logging (with an `airflow.connection` called `my_aws` for authorization): + +```yaml +airflow: + config: + AIRFLOW__LOGGING__REMOTE_LOGGING: "True" + AIRFLOW__LOGGING__REMOTE_BASE_LOG_FOLDER: "s3://<>/airflow/logs" + AIRFLOW__LOGGING__REMOTE_LOG_CONN_ID: "my_aws" + + connections: + ## see docs: https://airflow.apache.org/docs/apache-airflow-providers-amazon/stable/connections/aws.html + - id: my_aws + type: aws + description: my AWS connection + extra: |- + { "aws_access_key_id": "XXXXXXXX", + "aws_secret_access_key": "XXXXXXXX", + "region_name":"eu-central-1" } +``` + +For example, to use a remote S3 bucket for logging (with [EKS - IAM Roles for Service Accounts](https://docs.aws.amazon.com/eks/latest/userguide/iam-roles-for-service-accounts.html) for authorization): + +```yaml +airflow: + config: + AIRFLOW__LOGGING__REMOTE_LOGGING: "True" + AIRFLOW__LOGGING__REMOTE_BASE_LOG_FOLDER: "s3://<>/airflow/logs" + AIRFLOW__LOGGING__REMOTE_LOG_CONN_ID: "aws_default" + +serviceAccount: + annotations: + eks.amazonaws.com/role-arn: "arn:aws:iam::XXXXXXXXXX:role/<>" +``` + +### GCS Bucket (recommended on GCP) + +For example, to use a remote GCS bucket for logging (with an `airflow.connection` called `my_gcp` for authorization): + +```yaml +airflow: + config: + AIRFLOW__LOGGING__REMOTE_LOGGING: "True" + AIRFLOW__LOGGING__REMOTE_BASE_LOG_FOLDER: "gs://<>/airflow/logs" + AIRFLOW__LOGGING__REMOTE_LOG_CONN_ID: "my_gcp" + + connections: + ## see docs: https://airflow.apache.org/docs/apache-airflow-providers-google/stable/connections/gcp.html + - id: my_gcp + type: google_cloud_platform + description: my GCP connection + extra: |- + { "extra__google_cloud_platform__keyfile_dict": "XXXXXXXX", + "extra__google_cloud_platform__num_retries": "5" } +``` + +For example, to use a remote GCS bucket for logging (with [GKE - Workload Identity](https://cloud.google.com/kubernetes-engine/docs/how-to/workload-identity) for authorization): + +```yaml +airflow: + config: + AIRFLOW__LOGGING__REMOTE_LOGGING: "True" + AIRFLOW__LOGGING__REMOTE_BASE_LOG_FOLDER: "gs://<>/airflow/logs" + AIRFLOW__LOGGING__REMOTE_LOG_CONN_ID: "google_cloud_default" + +serviceAccount: + annotations: + iam.gke.io/gcp-service-account: "<>@<>.iam.gserviceaccount.com" +``` \ No newline at end of file diff --git a/charts/airflow/docs/faq/monitoring/prometheus.md b/charts/airflow/docs/faq/monitoring/prometheus.md new file mode 100644 index 00000000..e4920741 --- /dev/null +++ b/charts/airflow/docs/faq/monitoring/prometheus.md @@ -0,0 +1,16 @@ +[🔗 Return to `Table of Contents` for more FAQ topics 🔗](https://github.com/airflow-helm/charts/tree/main/charts/airflow#frequently-asked-questions) + +> Note, this page was written for the [`User-Community Airflow Helm Chart`](https://github.com/airflow-helm/charts/tree/main/charts/airflow) + +# How to integrate airflow with Prometheus? + +> 🟨 __Note__ 🟨 +> +> We are planning to implement native Prometheus/StatsD support in a future chart release. + +To be able to expose Airflow metrics to Prometheus you will need install a plugin, +one option is [epoch8/airflow-exporter](https://github.com/epoch8/airflow-exporter) which exports DAG and task metrics from Airflow. + +A [ServiceMonitor](https://github.com/prometheus-operator/prometheus-operator/blob/master/Documentation/api.md#servicemonitor) +is a resource introduced by the [Prometheus Operator](https://github.com/prometheus-operator/prometheus-operator), +for more information, see the `serviceMonitor` section of `values.yaml`. \ No newline at end of file diff --git a/charts/airflow/docs/faq/monitoring/scheduler-liveness-probe.md b/charts/airflow/docs/faq/monitoring/scheduler-liveness-probe.md new file mode 100644 index 00000000..a52a2821 --- /dev/null +++ b/charts/airflow/docs/faq/monitoring/scheduler-liveness-probe.md @@ -0,0 +1,111 @@ +[🔗 Return to `Table of Contents` for more FAQ topics 🔗](https://github.com/airflow-helm/charts/tree/main/charts/airflow#frequently-asked-questions) + +> Note, this page was written for the [`User-Community Airflow Helm Chart`](https://github.com/airflow-helm/charts/tree/main/charts/airflow) + +# How to configure the scheduler liveness probe? + +## Scheduler "Heartbeat Check" + +The chart includes a [Kubernetes Liveness Probe](https://kubernetes.io/docs/tasks/configure-pod-container/configure-liveness-readiness-startup-probes/) +for each airflow scheduler which regularly queries the Airflow Metadata Database to ensure the scheduler is ["healthy"](https://airflow.apache.org/docs/apache-airflow/stable/logging-monitoring/check-health.html). + +A scheduler is "healthy" if it has had a "heartbeat" in the last `AIRFLOW__SCHEDULER__SCHEDULER_HEALTH_CHECK_THRESHOLD` seconds. +Each scheduler will perform a "heartbeat" every `AIRFLOW__SCHEDULER__SCHEDULER_HEARTBEAT_SEC` seconds by updating the `latest_heartbeat` of its `SchedulerJob` in the Airflow Metadata `jobs` table. + +> 🟥 __Warning__ 🟥 +> +> A scheduler can have a "heartbeat" but be deadlocked such that it's unable to schedule new tasks, +> we provide the [`scheduler.livenessProbe.taskCreationCheck.*`](#scheduler-task-creation-check) values to automatically restart the scheduler in these cases. +> +> - https://github.com/apache/airflow/issues/7935 - patched in airflow `2.0.2` +> - https://github.com/apache/airflow/issues/15938 - patched in airflow `2.1.1` + +By default, the chart runs a liveness probe every __30 seconds__ (`periodSeconds`), and will restart a scheduler if __5 probe failures__ (`failureThreshold`) occur in a row. +This means a scheduler must be unhealthy for at least `30 x 5 = 150` seconds before Kubernetes will automatically restart a scheduler Pod. + +Here is an overview of the `scheduler.livenessProbe.*` values: + +```yaml +scheduler: + livenessProbe: + enabled: true + + ## number of seconds to wait after a scheduler container starts before running its first probe + ## NOTE: schedulers take a few seconds to actually start + initialDelaySeconds: 10 + + ## number of seconds to wait between each probe + periodSeconds: 30 + + ## maximum number of seconds that a probe can take before timing out + ## WARNING: if your database is very slow, you may need to increase this value to prevent invalid scheduler restarts + timeoutSeconds: 60 + + ## maximum number of consecutive probe failures, after which the scheduler will be restarted + ## NOTE: a "failure" could be any of: + ## 1. the probe takes more than `timeoutSeconds` + ## 2. the probe detects the scheduler as "unhealthy" + ## 3. the probe "task creation check" fails + failureThreshold: 5 +``` + +## Scheduler "Task Creation Check" + +The liveness probe can additionally check if the Scheduler is creating new [tasks](https://airflow.apache.org/docs/apache-airflow/stable/concepts/tasks.html) as an indication of its health. +This check works by ensuring that the most recent `LocalTaskJob` had a `start_date` no more than `scheduler.livenessProbe.taskCreationCheck.thresholdSeconds` seconds ago. + +> 🟦 __Tip__ 🟦 +> +> The "Task Creation Check" is currently disabled by default, it can be enabled with `scheduler.livenessProbe.taskCreationCheck.enabled`. + +Here is an overview of the `scheduler.livenessProbe.taskCreationCheck.*` values: + +```yaml +scheduler: + livenessProbe: + enabled: true + + taskCreationCheck: + ## if the task creation check is enabled + enabled: true + + ## the maximum number of seconds since the start_date of the most recent LocalTaskJob + ## WARNING: must be AT LEAST equal to your shortest DAG schedule_interval + ## WARNING: DummyOperator tasks will NOT be seen by this probe + thresholdSeconds: 300 +``` + +> 🟦 __Tip__ 🟦 +> +> You might use the following `canary_dag` DAG definition to run a small task every __300 seconds__ (5 minutes). +> +> ```python +> from datetime import datetime, timedelta +> from airflow import DAG +> +> # import using try/except to support both airflow 1 and 2 +> try: +> from airflow.operators.bash import BashOperator +> except ModuleNotFoundError: +> from airflow.operators.bash_operator import BashOperator +> +> dag = DAG( +> dag_id="canary_dag", +> default_args={ +> "owner": "airflow", +> }, +> schedule_interval="*/5 * * * *", +> start_date=datetime(2022, 1, 1), +> dagrun_timeout=timedelta(minutes=5), +> is_paused_upon_creation=False, +> catchup=False, +> ) +> +> # WARNING: while `DummyOperator` would use less resources, the check can't see those tasks +> # as they don't create LocalTaskJob instances +> task = BashOperator( +> task_id="canary_task", +> bash_command="echo 'Hello World!'", +> dag=dag, +> ) +> ``` \ No newline at end of file diff --git a/charts/airflow/docs/faq/security/airflow-users.md b/charts/airflow/docs/faq/security/airflow-users.md new file mode 100644 index 00000000..ff965848 --- /dev/null +++ b/charts/airflow/docs/faq/security/airflow-users.md @@ -0,0 +1,85 @@ +[🔗 Return to `Table of Contents` for more FAQ topics 🔗](https://github.com/airflow-helm/charts/tree/main/charts/airflow#frequently-asked-questions) + +> Note, this page was written for the [`User-Community Airflow Helm Chart`](https://github.com/airflow-helm/charts/tree/main/charts/airflow) + +# How to manage airflow users? + +## Define with Plain-Text + +You may use the `airflow.users` value to create airflow users in a declarative way. + +For example, to create `admin` (with "Admin" RBAC role) and `user` (with "User" RBAC role): + +```yaml +airflow: + users: + ## define the user called "admin" + - username: admin + password: admin + role: Admin + email: admin@example.com + firstName: admin + lastName: admin + + ## define the user called "user" + - username: user + password: user123 + ## TIP: `role` can be a single role or a list of roles + role: + - User + - Viewer + email: user@example.com + firstName: user + lastName: user + + ## if we create a Deployment to perpetually sync `airflow.users` + usersUpdate: true +``` + +## Define with templates from Secrets or ConfigMaps + +You may use `airflow.usersTemplates` to extract string templates from keys in Secrets or Configmaps. + +For example, to use templates from `Secret/my-secret` and `ConfigMap/my-configmap` in parts of the `admin` user: + +```yaml +airflow: + users: + ## define the user called "admin" + - username: admin + role: Admin + firstName: admin + lastName: admin + + ## use the ADMIN_PASSWORD template defined in `airflow.usersTemplates` + password: ${ADMIN_PASSWORD} + + ## use the ADMIN_EMAIL template defined in `airflow.usersTemplates` + email: ${ADMIN_EMAIL} + + ## bash-like templates to be used in `airflow.users` + usersTemplates: + + ## define the `ADMIN_PASSWORD` template from the `my-secret` Secret + ADMIN_PASSWORD: + kind: secret + name: my-secret + key: password + + ## define the `ADMIN_EMAIL` template from the `my-configmap` ConfigMap + ADMIN_EMAIL: + kind: configmap + name: my-configmap + key: email + + ## if we create a Deployment to perpetually sync `airflow.users` + usersUpdate: true +``` + +> 🟨 __Note__ 🟨 +> +> If `airflow.usersUpdate = true`, the users which use `airflow.usersTemplates` will be updated in real-time, allowing tools like [external-secrets](https://github.com/external-secrets/kubernetes-external-secrets) to be used. + +## Integrate with LDAP or OAUTH + +For more information, please refer to the [How to integrate airflow with LDAP or OAUTH?](ldap-oauth.md) page. \ No newline at end of file diff --git a/charts/airflow/docs/faq/security/ldap-oauth.md b/charts/airflow/docs/faq/security/ldap-oauth.md new file mode 100644 index 00000000..3df8869d --- /dev/null +++ b/charts/airflow/docs/faq/security/ldap-oauth.md @@ -0,0 +1,146 @@ +[🔗 Return to `Table of Contents` for more FAQ topics 🔗](https://github.com/airflow-helm/charts/tree/main/charts/airflow#frequently-asked-questions) + +> Note, this page was written for the [`User-Community Airflow Helm Chart`](https://github.com/airflow-helm/charts/tree/main/charts/airflow) + +# How to integrate airflow with LDAP or OAUTH? + +> 🟦 __Tip__ 🟦 +> +> After integrating with LDAP or OAUTH, you should: +> +> 1. Set the `airflow.users` value to `[]` +> 2. Manually delete any previously created users (with the airflow WebUI) + +> 🟦 __Tip__ 🟦 +> +> If you see a __blank screen__ after logging in as an LDAP or OAUTH user, it is probably because that user has not received at least the [`Viewer` FAB role](https://airflow.apache.org/docs/apache-airflow/stable/security/access-control.html#viewer). +> In both following examples, we set `AUTH_USER_REGISTRATION_ROLE = "Public"`, which does not provide access to the WebUI. +> Therefore, unless a binding from `AUTH_ROLES_MAPPING` gives the user the `Viewer`, `User`, `Op`, or `Admin` FAB role, they will be unable to see the WebUI. + +## Integrate with LDAP + +Airflow uses [Flask-Appbuilder](https://github.com/dpgaspar/Flask-AppBuilder) for its WebUI. + +> 🟦 __Tip__ 🟦 +> +> Learn more about [integrating Flask-Appbuilder with LDAP](https://flask-appbuilder.readthedocs.io/en/latest/security.html#authentication-ldap) in their docs. + +We provide `web.webserverConfig.*` to define the Flask-AppBuilder [`webserver_config.py`](../configuration/airflow-configs.md#webserver_configpy) file. + +For example, to integrate with a typical Microsoft Active Directory using Flask-AppBuilder's `AUTH_LDAP` `AUTH_TYPE`: + +```yaml +web: + # WARNING: for production usage, create your own image with these packages installed rather than using `extraPipPackages` + extraPipPackages: + ## the following configs require Flask-AppBuilder 3.2.0 (or later) + - "Flask-AppBuilder~=3.4.0" + ## the following configs require python-ldap + - "python-ldap~=3.4.0" + + webserverConfig: + stringOverride: |- + from airflow import configuration as conf + from flask_appbuilder.security.manager import AUTH_LDAP + + SQLALCHEMY_DATABASE_URI = conf.get('core', 'SQL_ALCHEMY_CONN') + + AUTH_TYPE = AUTH_LDAP + AUTH_LDAP_SERVER = "ldap://ldap.example.com" + AUTH_LDAP_USE_TLS = False + + # registration configs + AUTH_USER_REGISTRATION = True # allow users who are not already in the FAB DB + AUTH_USER_REGISTRATION_ROLE = "Public" # this role will be given in addition to any AUTH_ROLES_MAPPING + AUTH_LDAP_FIRSTNAME_FIELD = "givenName" + AUTH_LDAP_LASTNAME_FIELD = "sn" + AUTH_LDAP_EMAIL_FIELD = "mail" # if null in LDAP, email is set to: "{username}@email.notfound" + + # bind username (for password validation) + AUTH_LDAP_USERNAME_FORMAT = "uid=%s,ou=users,dc=example,dc=com" # %s is replaced with the provided username + # AUTH_LDAP_APPEND_DOMAIN = "example.com" # bind usernames will look like: {USERNAME}@example.com + + # search configs + AUTH_LDAP_SEARCH = "ou=users,dc=example,dc=com" # the LDAP search base (if non-empty, a search will ALWAYS happen) + AUTH_LDAP_UID_FIELD = "uid" # the username field + + # a mapping from LDAP DN to a list of FAB roles + AUTH_ROLES_MAPPING = { + "cn=airflow_users,ou=groups,dc=example,dc=com": ["User"], + "cn=airflow_admins,ou=groups,dc=example,dc=com": ["Admin"], + } + + # the LDAP user attribute which has their role DNs + AUTH_LDAP_GROUP_FIELD = "memberOf" + + # if we should replace ALL the user's roles each login, or only on registration + AUTH_ROLES_SYNC_AT_LOGIN = True + + # force users to re-auth after 30min of inactivity (to keep roles in sync) + PERMANENT_SESSION_LIFETIME = 1800 +``` + +## Integrate with OAUTH + +Airflow uses [Flask-Appbuilder](https://github.com/dpgaspar/Flask-AppBuilder) for its WebUI. + +> 🟦 __Tip__ 🟦 +> +> Learn more about [integrating Flask-Appbuilder with OAUTH](https://flask-appbuilder.readthedocs.io/en/latest/security.html#authentication-oauth) in their docs. + +We provide `web.webserverConfig.*` to define the Flask-AppBuilder [`webserver_config.py`](../configuration/airflow-configs.md#webserver_configpy) file. + +For example, to integrate with Okta using Flask-AppBuilder's `AUTH_OAUTH` `AUTH_TYPE`: + +```yaml +web: + extraPipPackages: + ## the following configs require Flask-AppBuilder 3.2.0 (or later) + - "Flask-AppBuilder~=3.4.0" + ## the following configs require Authlib + - "Authlib~=0.15.5" + + webserverConfig: + stringOverride: |- + from airflow import configuration as conf + from flask_appbuilder.security.manager import AUTH_OAUTH + + SQLALCHEMY_DATABASE_URI = conf.get('core', 'SQL_ALCHEMY_CONN') + + AUTH_TYPE = AUTH_OAUTH + + # registration configs + AUTH_USER_REGISTRATION = True # allow users who are not already in the FAB DB + AUTH_USER_REGISTRATION_ROLE = "Public" # this role will be given in addition to any AUTH_ROLES_MAPPING + + # the list of providers which the user can choose from + OAUTH_PROVIDERS = [ + { + 'name': 'okta', + 'icon': 'fa-circle-o', + 'token_key': 'access_token', + 'remote_app': { + 'client_id': 'OKTA_KEY', + 'client_secret': 'OKTA_SECRET', + 'api_base_url': 'https://OKTA_DOMAIN.okta.com/oauth2/v1/', + 'client_kwargs': { + 'scope': 'openid profile email groups' + }, + 'access_token_url': 'https://OKTA_DOMAIN.okta.com/oauth2/v1/token', + 'authorize_url': 'https://OKTA_DOMAIN.okta.com/oauth2/v1/authorize', + } + } + ] + + # a mapping from the values of `userinfo["role_keys"]` to a list of FAB roles + AUTH_ROLES_MAPPING = { + "FAB_USERS": ["User"], + "FAB_ADMINS": ["Admin"], + } + + # if we should replace ALL the user's roles each login, or only on registration + AUTH_ROLES_SYNC_AT_LOGIN = True + + # force users to re-auth after 30min of inactivity (to keep roles in sync) + PERMANENT_SESSION_LIFETIME = 1800 +``` \ No newline at end of file diff --git a/charts/airflow/docs/faq/security/set-fernet-key.md b/charts/airflow/docs/faq/security/set-fernet-key.md new file mode 100644 index 00000000..f0f84aad --- /dev/null +++ b/charts/airflow/docs/faq/security/set-fernet-key.md @@ -0,0 +1,63 @@ +[🔗 Return to `Table of Contents` for more FAQ topics 🔗](https://github.com/airflow-helm/charts/tree/main/charts/airflow#frequently-asked-questions) + +> Note, this page was written for the [`User-Community Airflow Helm Chart`](https://github.com/airflow-helm/charts/tree/main/charts/airflow) + +# How to set the fernet encryption key? + +## Option 1 - using the value + +> 🟥 __Warning__ 🟥 +> +> We strongly recommend that you DO NOT USE the default `airflow.fernetKey` in production. + +You may set the fernet encryption key using the `airflow.fernetKey` value, which sets the `AIRFLOW__CORE__FERNET_KEY` environment variable. + +For example, to define the fernet key with `airflow.fernetKey`: + +```yaml +aiflow: + fernetKey: "7T512UXSSmBOkpWimFHIVb8jK6lfmSAvx4mO6Arehnc=" +``` + +## Option 2 - using a secret (recommended) + +You may set the fernet encryption key from a Kubernetes Secret by referencing it with the `airflow.extraEnv` value. + +For example, to use the `value` key from the existing Secret called `airflow-fernet-key`: + +```yaml +airflow: + extraEnv: + - name: AIRFLOW__CORE__FERNET_KEY + valueFrom: + secretKeyRef: + name: airflow-fernet-key + key: value +``` + +## Option 3 - using `_CMD` or `_SECRET` configs + +You may also set the fernet key by specifying either the `AIRFLOW__CORE__FERNET_KEY_CMD` or `AIRFLOW__CORE__FERNET_KEY_SECRET` environment variables. +Read about how the `_CMD` or `_SECRET` configs work in the ["Setting Configuration Options"](https://airflow.apache.org/docs/apache-airflow/stable/howto/set-config.html) section of the Airflow documentation. + +For example, to use `AIRFLOW__CORE__FERNET_KEY_CMD`: + +```yaml +airflow: + ## WARNING: you must set `fernetKey` to "", otherwise it will take precedence + fernetKey: "" + + ## NOTE: this is only an example, if your value lives in a Secret, you probably want to use "Option 2" above + config: + AIRFLOW__CORE__FERNET_KEY_CMD: "cat /opt/airflow/fernet-key/value" + + extraVolumeMounts: + - name: fernet-key + mountPath: /opt/airflow/fernet-key + readOnly: true + + extraVolumes: + - name: fernet-key + secret: + secretName: airflow-fernet-key +``` \ No newline at end of file diff --git a/charts/airflow/docs/faq/security/set-webserver-secret-key.md b/charts/airflow/docs/faq/security/set-webserver-secret-key.md new file mode 100644 index 00000000..4d9380a2 --- /dev/null +++ b/charts/airflow/docs/faq/security/set-webserver-secret-key.md @@ -0,0 +1,63 @@ +[🔗 Return to `Table of Contents` for more FAQ topics 🔗](https://github.com/airflow-helm/charts/tree/main/charts/airflow#frequently-asked-questions) + +> Note, this page was written for the [`User-Community Airflow Helm Chart`](https://github.com/airflow-helm/charts/tree/main/charts/airflow) + +# How to set the webserver secret key? + +## Option 1 - using the value + +> 🟥 __Warning__ 🟥 +> +> We strongly recommend that you DO NOT USE the default `airflow.webserverSecretKey` in production. + +You may set the webserver secret_key using the `airflow.webserverSecretKey` value, which sets the `AIRFLOW__WEBSERVER__SECRET_KEY` environment variable. + +For example, to define the secret_key with `airflow.webserverSecretKey`: + +```yaml +aiflow: + webserverSecretKey: "THIS IS UNSAFE!" +``` + +## Option 2 - using a secret (recommended) + +You may set the webserver secret_key from a Kubernetes Secret by referencing it with the `airflow.extraEnv` value. + +For example, to use the `value` key from the existing Secret called `airflow-webserver-secret-key`: + +```yaml +airflow: + extraEnv: + - name: AIRFLOW__WEBSERVER__SECRET_KEY + valueFrom: + secretKeyRef: + name: airflow-webserver-secret-key + key: value +``` + +## Option 3 - using `_CMD` or `_SECRET` configs + +You may also set the webserver secret key by specifying either the `AIRFLOW__WEBSERVER__SECRET_KEY_CMD` or `AIRFLOW__WEBSERVER__SECRET_KEY_SECRET` environment variables. +Read about how the `_CMD` or `_SECRET` configs work in the ["Setting Configuration Options"](https://airflow.apache.org/docs/apache-airflow/stable/howto/set-config.html) section of the Airflow documentation. + +For example, to use `AIRFLOW__WEBSERVER__SECRET_KEY_CMD`: + +```yaml +airflow: + ## WARNING: you must set `webserverSecretKey` to "", otherwise it will take precedence + webserverSecretKey: "" + + ## NOTE: this is only an example, if your value lives in a Secret, you probably want to use "Option 2" above + config: + AIRFLOW__WEBSERVER__SECRET_KEY_CMD: "cat /opt/airflow/webserver-secret-key/value" + + extraVolumeMounts: + - name: webserver-secret-key + mountPath: /opt/airflow/webserver-secret-key + readOnly: true + + extraVolumes: + - name: webserver-secret-key + secret: + secretName: airflow-webserver-secret-key +``` diff --git a/charts/airflow/docs/guides/quickstart.md b/charts/airflow/docs/guides/quickstart.md new file mode 100644 index 00000000..99c94630 --- /dev/null +++ b/charts/airflow/docs/guides/quickstart.md @@ -0,0 +1,82 @@ +[🔗 Return to `Table of Contents` for more guides 🔗](https://github.com/airflow-helm/charts/tree/main/charts/airflow#guides) + +> Note, this page was written for the [`User-Community Airflow Helm Chart`](https://github.com/airflow-helm/charts/tree/main/charts/airflow) + +# Quickstart Guide + +> 🟦 __Tip__ 🟦 +> +> To deploy the [`User-Community Airflow Helm Chart`](https://github.com/airflow-helm/charts/tree/main/charts/airflow) you will need a Kubernetes cluster. +> +> The following table lists some popular Kubernetes distributions by platform. +> +> Platform | Kubernetes Distribution +> --- | --- +> Local Machine | [k3d](https://k3d.io/) +> Local Machine | [kind](https://kind.sigs.k8s.io/) +> Local Machine | [minikube](https://minikube.sigs.k8s.io/) +> Amazon Web Services | [Amazon Elastic Kubernetes Service (EKS)](https://aws.amazon.com/eks/) +> Microsoft Azure | [Azure Kubernetes Service (AKS)](https://azure.microsoft.com/en-au/services/kubernetes-service/) +> Google Cloud | [Google Kubernetes Engine (GKE)](https://cloud.google.com/kubernetes-engine) +> Alibaba Cloud | [Alibaba Cloud Container Service for Kubernetes (ACK)](https://www.alibabacloud.com/product/kubernetes) +> IBM Cloud | [IBM Cloud Kubernetes Service (IKS)](https://www.ibm.com/cloud/kubernetes-service) + +## STEP 1 - Prepare Your Environment + +- Kubernetes `1.18+` +- Helm `3.0+` ([installing helm](https://helm.sh/docs/intro/install/)) +- (Optional) configure a Git repo with your DAG files ([loading dag definitions](../faq/dags/load-dag-definitions.md)) +- (Optional) an external `PostgreSQL` or `MySQL` database ([connecting your database](../faq/database/external-database.md)) +- (Optional) an external `Redis` database for `CeleryExecutor` ([connecting your redis](../faq/database/external-redis.md)) + +## STEP 2 - Add the Helm Repository + +```shell +## add this helm repository & pull updates from it +helm repo add airflow-stable https://airflow-helm.github.io/charts +helm repo update +``` + +## STEP 3 - Install the Airflow Chart + +```shell +## set the release-name & namespace +export AIRFLOW_NAME="airflow-cluster" +export AIRFLOW_NAMESPACE="airflow-cluster" + +## create the namespace +kubectl create ns "$AIRFLOW_NAMESPACE" + +## install using helm 3 +helm install \ + "$AIRFLOW_NAME" \ + airflow-stable/airflow \ + --namespace "$AIRFLOW_NAMESPACE" \ + --version "8.X.X" \ + --values ./custom-values.yaml + +## wait until the above command returns (may take a while) +``` + +> 🟦 __Tip__ 🟦 +> +> To create your `./custom-values.yaml`, refer to our other documentation. +> +> - [Frequently Asked Questions](../..#frequently-asked-questions) +> - [Examples](../..#examples) +> - [Helm Values](../..#helm-values) + +## STEP 4 - Access the Airflow UI + +```shell +## port-forward the airflow webserver +kubectl port-forward svc/${AIRFLOW_NAME}-web 8080:8080 --namespace $AIRFLOW_NAMESPACE + +## open your browser to: http://localhost:8080 +``` + +> 🟦 __Tip__ 🟦 +> +> The default Airflow UI login is `admin`/`admin`. +> +> You may also [define your own users](../faq/security/airflow-users.md) or [integrate with your LDAP/OAUTH](../faq/security/ldap-oauth.md). diff --git a/charts/airflow/docs/guides/uninstall.md b/charts/airflow/docs/guides/uninstall.md new file mode 100644 index 00000000..5da862c0 --- /dev/null +++ b/charts/airflow/docs/guides/uninstall.md @@ -0,0 +1,16 @@ +[🔗 Return to `Table of Contents` for more guides 🔗](https://github.com/airflow-helm/charts/tree/main/charts/airflow#guides) + +> Note, this page was written for the [`User-Community Airflow Helm Chart`](https://github.com/airflow-helm/charts/tree/main/charts/airflow) + +# Uninstall Guide + +```shell +## set the release-name & namespace (must be same as previously installed) +export AIRFLOW_NAME="airflow-cluster" +export AIRFLOW_NAMESPACE="airflow-cluster" + +## uninstall the chart +helm uninstall \ + "$AIRFLOW_NAME" \ + --namespace "$AIRFLOW_NAMESPACE" +``` \ No newline at end of file diff --git a/charts/airflow/docs/guides/upgrade.md b/charts/airflow/docs/guides/upgrade.md new file mode 100644 index 00000000..9520b83a --- /dev/null +++ b/charts/airflow/docs/guides/upgrade.md @@ -0,0 +1,30 @@ +[🔗 Return to `Table of Contents` for more guides 🔗](https://github.com/airflow-helm/charts/tree/main/charts/airflow#guides) + +> Note, this page was written for the [`User-Community Airflow Helm Chart`](https://github.com/airflow-helm/charts/tree/main/charts/airflow) + +# Upgrade Guide + +> 🟦 __Tip__ 🟦 +> +> Always consult the [CHANGELOG](../../CHANGELOG.md) before upgrading chart versions. + +> 🟦 __Tip__ 🟦 +> +> Always pin a specific `--version X.X.X` rather than installing the latest version. + +```shell +## pull updates from the helm repository +helm repo update + +## set the release-name & namespace (must be same as previously installed) +export AIRFLOW_NAME="airflow-cluster" +export AIRFLOW_NAMESPACE="airflow-cluster" + +## apply any changed `custom-values.yaml` AND upgrade the chart to version `8.X.X` +helm upgrade \ + "$AIRFLOW_NAME" \ + airflow-stable/airflow \ + --namespace "$AIRFLOW_NAMESPACE" \ + --version "8.X.X" \ + --values ./custom-values.yaml +``` \ No newline at end of file