A comprehensive data pipeline leveraging Airflow, DBT, Google Cloud Platform (GCP), and Docker to extract, transform, and load data seamlessly from a staging layer to a data warehouse and data mart.
This project demonstrates an end-to-end implementation of a modern data stack:
- Airflow: Orchestrates the data pipeline with DAGs.
- DBT (Data Build Tool): Handles the transformation of data from the staging layer to the data warehouse and data mart.
- GCP: Serves as the cloud platform for data storage and warehouse management.
- Docker: Ensures all tools and dependencies are containerized for consistent development and deployment.
-
Automated Orchestration: Airflow DAGs schedule and automate tasks for data extraction, transformation, and loading.
-
Data Transformation with DBT:
Staging Layer: Raw data is cleaned and standardized.
Data Warehouse: Normalized data structure.
Data Mart: Denormalized data for easy reporting and analytics.
-
Cloud Integration: Leverages GCP's scalable infrastructure for efficient storage and querying.
-
Dockerized Environment: Simplifies setup and deployment across any environment.
- clone repo with
git clone https://github.com/lixx21/airflow-dbt-gcp.git
- Setup your Google Cloud Platform
- Create project in GCP and bucket in GCS and make sure your bucket location is in US
- Get your credential key from GCP IAM (I would suggest you to store it inside dags folder)
- Fill the .env file with your environment:
#.env
BUCKET_NAME =
CREDENTIAL_KEY =
GCP_CONN_ID=
PROJECT_ID=
- Fill the profiles.yml with your credential's keyfile location in your local and your Project ID
dbt_transform:
outputs:
dev:
dataset: shopping_data
job_execution_timeout_seconds: 300
job_retries: 1
keyfile: {your keyfile location}
location: US
method: service-account
priority: interactive
project: {GCP project id}
threads: 1
type: bigquery
target: dev
- Create dataset named
shopping_data
in BigQuery and make sure your dataset in US location (because DBT only support US for now) - run the project using docker:
docker-compose up --build -d
Setup DBT BigQuery
https://docs.getdbt.com/docs/core/connect-data-platform/bigquery-setup
https://medium.com/@perkasaid.rio/easiest-way-installing-dbt-for-bigquery-54d1c05f6dfe