Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automate Kubernetes DAG Phase-2 #1160

Open
pankajastro opened this issue Aug 15, 2024 · 0 comments
Open

Automate Kubernetes DAG Phase-2 #1160

pankajastro opened this issue Aug 15, 2024 · 0 comments
Labels
area:config Related to configuration, like YAML files, environment variables, or executer configuration area:dependencies Related to dependencies, like Python packages, library versions, etc area:execution Related to the execution environment/mode, like Docker, Kubernetes, Local, VirtualEnv, etc enhancement New feature or request execution:kubernetes Related to Kubernetes execution environment

Comments

@pankajastro
Copy link
Contributor

We automated running Kubernetes example in the PR #1127. However, there's a workaround in place that we should address in the future.

  • Use the hatch target to run the test. I have introduced the hatch target to run the Kubernetes example with hatch, but it's currently not working due to a mismatch between the local and container DBT project paths. This requires a bit more work.
  • Remove the virtual environment step (Install packages and dependencies) in the CI configuration for Run-Kubernetes-Tests and use hatch instead.
  • Update the profile YAML to use environment variables for the port, as it is currently hardcoded.
  • Remove the host from the Kubernetes secret and replace it with the username and make corresponding change in DAG
  • Currently, we need to export both POSTGRES_DATABASE and POSTGRES_DB in the Dockerfile because both are used in the project. To ensure consistency, avoid exporting both and instead make the environment variables consistent across the repository
  • Not a big deal in this context, but we have some hardcoded values for secrets. It would be better to parameterize them
@pankajastro pankajastro added the enhancement New feature or request label Aug 15, 2024
@dosubot dosubot bot added area:config Related to configuration, like YAML files, environment variables, or executer configuration area:dependencies Related to dependencies, like Python packages, library versions, etc execution:kubernetes Related to Kubernetes execution environment area:execution Related to the execution environment/mode, like Docker, Kubernetes, Local, VirtualEnv, etc labels Aug 15, 2024
pankajastro added a commit that referenced this issue Aug 15, 2024
## Description

### Migrate example from
[cosmos-example](https://github.com/astronomer/cosmos-example/)

The [cosmos-example](https://github.com/astronomer/cosmos-example/)
repository currently contains several examples, including those that run
in Kubernetes execution mode. This setup has made testing local changes
in Kubernetes execution mode challenging and keeping the documentation
up-to-date is also not easy. Therefore, it makes sense to migrate the
Kubernetes examples from
[cosmos-example](https://github.com/astronomer/cosmos-example/) to this
repository. This PR resolved the below issue in this regard
- Migrate the
[jaffle_shop_kubernetes](https://github.com/astronomer/cosmos-example/blob/main/dags/jaffle_shop_kubernetes.py)
example DAG to the this repository.
- Moved the Dockerfile from
[cosmos-example](https://github.com/astronomer/cosmos-example/blob/main/Dockerfile.postgres_profile_docker_k8s)
to this repository to build the image with the necessary DAGs and DBT
projects
I also adjusted both the example DAG and Dockerfile to work within this
repository.

### Automate running locally 
I introduce some scripts to make running Kubernetes DAG easy.

**postgres-deployment.yaml:** Kubernetes resource file for spinning up
PostgreSQL and creating Kubernetes secrets.

**integration-kubernetes.sh:** Runs the Kubernetes DAG using pytest.

**kubernetes-setup.sh:**

- Builds the Docker image with the Jaffle Shop dbt project and DAG, and
loads the Docker image into the local registry.
- Creates Kubernetes resources such as PostgreSQL deployment, service,
and secret.

**Run DAG locally**
Prerequisites:

- Docker Desktop
- KinD (Kubernetes in Docker)
- kubectl

Steps:
1. Create cluster:  `kind create cluster`
2. Create Resource: `scripts/test/kubernetes-setup.sh` (This will set up
PostgreSQL and load the DBT project into the local registry)
3. Run DAG: `cd dev && scripts/test/integration-kubernetes.sh` this will
execute this DAG with a pytest you can also run directly with airflow
command given that project is installed in your virtual env
```
time AIRFLOW__COSMOS__PROPAGATE_LOGS=0 AIRFLOW__COSMOS__ENABLE_CACHE=1 AIRFLOW__COSMOS__CACHE_DIR=/tmp/ AIRFLOW_CONN_EXAMPLE_
CONN="postgres://postgres:[email protected]:5432/postgres" PYTHONPATH=`pwd` AIRFLOW_HOME=`pwd` AIRFLOW__CORE__DAGBAG_IMPORT_TIMEOUT=20000 AIRFLOW__CORE__DAG_FILE_PROCESSOR_TIMEOUT=20000 airflow dags test jaffle_shop_kubernetes  `date -Iseconds`
```
### Run jaffle_shop_kubernetes in CI
To avoid regression we have automated running the jaffle_shop_kubernetes
in CI

- Set up the GitHub Actions infrastructure to run DAGs using Kubernetes
execution mode
- Use
[container-tools/kind-action@v1](https://github.com/container-tools/kind-action)
to create a KinD cluster.
- Used the bash script to streamline the creation of Kubernetes
resources, build and load the image into a local registry, and execute
tests.
- At the moment I'm running the pytest from virtual env


### Documentation changes
Given that the DAG
[jaffle_shop_kubernetes](https://github.com/astronomer/cosmos-example/blob/main/dags/jaffle_shop_kubernetes.py)
is now part of this repository, I have automated the example rendering
for Kubernetes execution mode. This ensures that we avoid displaying
outdated example code.


https://astronomer.github.io/astronomer-cosmos/getting_started/execution-modes.html#kubernetes
<img width="822" alt="Screenshot 2024-08-15 at 8 03 59 PM"
src="https://github.com/user-attachments/assets/1eadad09-9b7c-43e1-bcd8-b08dd21e3878">


https://astronomer.github.io/astronomer-cosmos/getting_started/kubernetes.html#kubernetes

<img width="812" alt="Screenshot 2024-08-15 at 8 04 22 PM"
src="https://github.com/user-attachments/assets/7161fa9b-e5c1-44d8-8702-b2c583dee236">

### Future work

- Use the hatch target to run the test. I have introduced the hatch
target to run the Kubernetes example with hatch, but it's currently not
working due to a mismatch between the local and container DBT project
paths. This requires a bit more work.
- Remove the virtual environment step (Install packages and
dependencies) in the CI configuration for Run-Kubernetes-Tests and use
hatch instead.
- Update the profile YAML to use environment variables for the port, as
it is currently hardcoded.
- Remove the host from the Kubernetes secret and replace it with the
username and make corresponding change in DAG
- Currently, we need to export both POSTGRES_DATABASE and POSTGRES_DB in
the Dockerfile because both are used in the project. To ensure
consistency, avoid exporting both and instead make the environment
variables consistent across the repository
- Not a big deal in this context, but we have some hardcoded values for
secrets. It would be better to parameterize them

GH issue for future improvement:
#1160

### Example CI Run

-
https://github.com/astronomer/astronomer-cosmos/actions/runs/10405590862

## Related Issue(s)

closes: #535

## Breaking Change?

No

<!-- If this introduces a breaking change, specify that here. -->

## Checklist

- [x] I have made corresponding changes to the documentation (if
required)
- [x] I have added tests that prove my fix is effective or that my
feature works
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:config Related to configuration, like YAML files, environment variables, or executer configuration area:dependencies Related to dependencies, like Python packages, library versions, etc area:execution Related to the execution environment/mode, like Docker, Kubernetes, Local, VirtualEnv, etc enhancement New feature or request execution:kubernetes Related to Kubernetes execution environment
Projects
None yet
Development

No branches or pull requests

1 participant