From 737de907e7ff9315ad60c5adb0c6f25569905a72 Mon Sep 17 00:00:00 2001 From: Tatiana Al-Chueyr Date: Fri, 8 Sep 2023 14:49:13 +0100 Subject: [PATCH] Add docs comparing Airflow and dbt concepts Also fix Sphinx duplicated reference --- docs/configuration/lineage.rst | 2 +- docs/getting_started/dbt-airflow-concepts.rst | 34 +++++++++++++++++++ docs/getting_started/index.rst | 7 ++++ 3 files changed, 42 insertions(+), 1 deletion(-) create mode 100644 docs/getting_started/dbt-airflow-concepts.rst diff --git a/docs/configuration/lineage.rst b/docs/configuration/lineage.rst index 54f9ad46d..bf099f344 100644 --- a/docs/configuration/lineage.rst +++ b/docs/configuration/lineage.rst @@ -26,7 +26,7 @@ Otherwise, install Cosmos using ``astronomer-cosmos[openlineage]``. Configuration ------------- -If using Airflow 2.7, follow `these instructions `_ on how to configure OpenLineage. +If using Airflow 2.7, follow `the instructions `_ on how to configure OpenLineage. Otherwise, follow `these instructions `_. diff --git a/docs/getting_started/dbt-airflow-concepts.rst b/docs/getting_started/dbt-airflow-concepts.rst new file mode 100644 index 000000000..6afc1b1fb --- /dev/null +++ b/docs/getting_started/dbt-airflow-concepts.rst @@ -0,0 +1,34 @@ +.. _dbt-airflow-concepts: + +Similar dbt & Airflow concepts +============================== + +While dbt is an open source tool for data transformations and analysis, using SQL, Airflow focuses on being a platform +for the development, scheduling and monitoring of batch-oriented workflows, using Python. Although both tools have many +differences, they also share similar concepts. + +This page aims to list some of these concepts and help those +who may be new to Airflow or dbt and are considering to use Cosmos. + + ++----------------+--------------+---------------------------------------------------------------------------------+-----------------------------------------------------------------------------+--------------------------------------------------------------------------------------+ +| Airflow naming | dbt naming | Description | Differences | References | ++================+==============+=================================================================================+=============================================================================+======================================================================================+ +| DAG | Workflow | Pipeline (Direct Acyclic Graph) that contains a group of steps | Airflow expects upstream tasks to have passed to run downstream tasks. | https://airflow.apache.org/docs/apache-airflow/2.7.1/core-concepts/dags.html | +| | | | dbt can run a subset of tasks assuming upstream tasks were run. | https://docs.getdbt.com/docs/introduction | ++----------------+--------------+---------------------------------------------------------------------------------+-----------------------------------------------------------------------------+--------------------------------------------------------------------------------------+ +| Task | Node | Step within a pipeline (DAG or workflow) | In dbt, users write mostly SQL and YML to define the steps of a pipeline. | https://docs.getdbt.com/reference/node-selection/syntax | +| | | | Airflow expects steps to be written in Python. | https://airflow.apache.org/docs/apache-airflow/2.7.1/core-concepts/tasks.html | ++----------------+--------------+---------------------------------------------------------------------------------+-----------------------------------------------------------------------------+--------------------------------------------------------------------------------------+ +| Variables | Variables | Key-value configuration that can be used in steps and avoids hard-coded values | | https://docs.getdbt.com/docs/build/project-variables | +| | | | | https://airflow.apache.org/docs/apache-airflow/2.7.1/core-concepts/variables.html | ++----------------+--------------+---------------------------------------------------------------------------------+-----------------------------------------------------------------------------+--------------------------------------------------------------------------------------+ +| Templating | Macros | Jinja templating used to access variables, configuration and reference steps | dbt encourages using jinja templating for control structures (if and for). | https://docs.getdbt.com/docs/build/jinja-macros | +| | | | Airflow usage is limited to variables, macros and filters | https://airflow.apache.org/docs/apache-airflow/stable/templates-ref.html | ++----------------+--------------+---------------------------------------------------------------------------------+-----------------------------------------------------------------------------+--------------------------------------------------------------------------------------+ +| Connection | Profile | Configuration to connect to databases or other services | | https://airflow.apache.org/docs/apache-airflow/stable/howto/connection.html | +| | | | | https://docs.getdbt.com/docs/core/connect-data-platform/connection-profiles | ++----------------+--------------+---------------------------------------------------------------------------------+-----------------------------------------------------------------------------+--------------------------------------------------------------------------------------+ +| Providers | Adapter | Additional Python libraries that support specific databases or services | | https://airflow.apache.org/docs/apache-airflow-providers/ | +| | | | | https://docs.getdbt.com/guides/dbt-ecosystem/adapter-development/1-what-are-adapters | ++----------------+--------------+---------------------------------------------------------------------------------+-----------------------------------------------------------------------------+--------------------------------------------------------------------------------------+ diff --git a/docs/getting_started/index.rst b/docs/getting_started/index.rst index 6f0325128..c71589ec2 100644 --- a/docs/getting_started/index.rst +++ b/docs/getting_started/index.rst @@ -11,6 +11,7 @@ Execution Modes Docker Execution Mode Kubernetes Execution Mode + dbt and Airflow Similar Concepts Getting Started @@ -38,3 +39,9 @@ For specific guides, see the following: - `Executing dbt DAGs with Docker Operators `__ - `Executing dbt DAGs with KubernetesPodOperators `__ + + +Concepts Overview +----------------- + +How do dbt and Airflow concepts map to each other? Learn more `in this link `__.