Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor dbt project parsing and Airflow DAG generation #360

Merged
merged 56 commits into from
Jul 18, 2023
Merged

Conversation

tatiana
Copy link
Collaborator

@tatiana tatiana commented Jul 7, 2023

This is a major refactor related to how we render dbt projects in Airflow.

Some of the changes:

  • Create a common class to define the share interface between DbtDag and DbtTaskGroup
  • Define data classes to exchange information between the different steps of the Cosmos (parse dbt project, validate arguments, filter, create Airflow DAG)
  • Split the logic of parsing dbt projects from the actual Airflow DAG generation
  • Add support to parse dbt projects using dbt ls
  • Add support to parse dbt projects using their manifest.json

Breaking changes:

  • from cosmos.dag import DbtDag is now from cosmos.airflow.dag import DbtDag or from cosmos import DbtDag
  • from cosmos.dag import DbtTaskGroup is now from cosmos.airflow.dag import DbtTaskGroup or from cosmos import DbtTaskGroup
  • select and exclude arguments, passed to DbtDag or DbtTaskGroup used to be dictionaries, now they are lists - so we are compatible with the representation in dbt

DISCLAIMER: This PR is in draft mode because I'm improving the following:

(features)

  • expose automatic load mode and manifest in the DbtToAirflowConverter class
  • decide if we want to use Enum or Literals on the interface with end-users
  • the dbt ls expects the user to have a valid profiles.yml - we can extract the logic from the local executor. It feels we could do this in a follow-up PR

(test features we lack integration tests)

  • validate with K8s operators
  • validate with docker operators

(good practices)

  • test coverage
  • docstrings
  • type hints
  • improve times for running tests (because automatic is the default load mode, it seems the parsing time increased - both for unit and integration)

Closes: #308

@netlify
Copy link

netlify bot commented Jul 7, 2023

👷 Deploy Preview for amazing-pothos-a3bca0 processing.

Name Link
🔨 Latest commit 3f171db
🔍 Latest deploy log https://app.netlify.com/sites/amazing-pothos-a3bca0/deploys/64b5d6801080bf0008362290

@codecov
Copy link

codecov bot commented Jul 7, 2023

Codecov Report

Patch coverage: 100.00% and project coverage change: +1.75 🎉

Comparison is base (387615f) 88.94% compared to head (3f171db) 90.70%.

Additional details and impacted files
@@            Coverage Diff             @@
##             main     #360      +/-   ##
==========================================
+ Coverage   88.94%   90.70%   +1.75%     
==========================================
  Files          39       44       +5     
  Lines        1303     1484     +181     
==========================================
+ Hits         1159     1346     +187     
+ Misses        144      138       -6     
Impacted Files Coverage Δ
cosmos/operators/local.py 81.17% <ø> (ø)
cosmos/__init__.py 100.00% <100.00%> (ø)
cosmos/airflow/dag.py 100.00% <100.00%> (ø)
cosmos/airflow/graph.py 100.00% <100.00%> (ø)
cosmos/airflow/task_group.py 100.00% <100.00%> (ø)
cosmos/converter.py 100.00% <100.00%> (ø)
cosmos/core/airflow.py 93.75% <100.00%> (-0.13%) ⬇️
cosmos/dbt/executable.py 100.00% <100.00%> (ø)
cosmos/dbt/graph.py 100.00% <100.00%> (ø)
cosmos/dbt/project.py 100.00% <100.00%> (ø)
... and 1 more

... and 1 file with indirect coverage changes

☔ View full report in Codecov by Sentry.
📢 Do you have feedback about the report comment? Let us know in this issue.

cosmos/airflow/group.py Outdated Show resolved Hide resolved
cosmos/dbt/graph.py Outdated Show resolved Hide resolved
cosmos/dbt/graph.py Outdated Show resolved Hide resolved
@tatiana tatiana changed the title Refactor dbt parsing from Airflow DAG generation Refactor dbt project parsing and Airflow DAG generation Jul 12, 2023
@tatiana tatiana marked this pull request as ready for review July 17, 2023 11:19
@tatiana tatiana requested a review from a team as a code owner July 17, 2023 11:19
@tatiana tatiana requested a review from a team July 17, 2023 11:19
Copy link
Collaborator

@jlaneve jlaneve left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

approving conditional on unit tests passing

@tatiana tatiana merged commit 1be16c1 into main Jul 18, 2023
38 checks passed
@tatiana tatiana deleted the issue-308-dbt-ls branch July 18, 2023 00:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Respect config from dbt_project.yml
3 participants