Data Warehouse Project: Airflow + DBT + GCP

A comprehensive data pipeline leveraging Airflow, DBT, Google Cloud Platform (GCP), and Docker to extract, transform, and load data seamlessly from a staging layer to a data warehouse and data mart.

Project Structure

🛠️ Project Overview

This project demonstrates an end-to-end implementation of a modern data stack:

Airflow: Orchestrates the data pipeline with DAGs.
DBT (Data Build Tool): Handles the transformation of data from the staging layer to the data warehouse and data mart.
GCP: Serves as the cloud platform for data storage and warehouse management.
Docker: Ensures all tools and dependencies are containerized for consistent development and deployment.

✨ Features

Automated Orchestration: Airflow DAGs schedule and automate tasks for data extraction, transformation, and loading.
Data Transformation with DBT:

Staging Layer: Raw data is cleaned and standardized.

Data Warehouse: Normalized data structure.

Data Mart: Denormalized data for easy reporting and analytics.

Cloud Integration: Leverages GCP's scalable infrastructure for efficient storage and querying.
Dockerized Environment: Simplifies setup and deployment across any environment.

Setup

clone repo with git clone https://github.com/lixx21/airflow-dbt-gcp.git
Setup your Google Cloud Platform
Create project in GCP and bucket in GCS and make sure your bucket location is in US
Get your credential key from GCP IAM (I would suggest you to store it inside dags folder)
Fill the .env file with your environment:

#.env

BUCKET_NAME = 
CREDENTIAL_KEY = 
GCP_CONN_ID= 
PROJECT_ID=

Fill the profiles.yml with your credential's keyfile location in your local and your Project ID

dbt_transform:
  outputs:
    dev:
      dataset: shopping_data
      job_execution_timeout_seconds: 300
      job_retries: 1
      keyfile: {your keyfile location}
      location: US
      method: service-account
      priority: interactive
      project: {GCP project id}
      threads: 1
      type: bigquery
  target: dev

Create dataset named shopping_data in BigQuery and make sure your dataset in US location (because DBT only support US for now)
run the project using docker:

docker-compose up --build -d

Reference

Setup DBT BigQuery

https://docs.getdbt.com/docs/core/connect-data-platform/bigquery-setup

https://medium.com/@perkasaid.rio/easiest-way-installing-dbt-for-bigquery-54d1c05f6dfe

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
config		config
dags		dags
images		images
.gitignore		.gitignore
Dockerfile		Dockerfile
docker-compose.yaml		docker-compose.yaml
readme.md		readme.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Data Warehouse Project: Airflow + DBT + GCP

Project Structure

🛠️ Project Overview

✨ Features

Setup

Reference

About

Releases

Packages

Languages

lixx21/airflow-dbt-gcp

Folders and files

Latest commit

History

Repository files navigation

Data Warehouse Project: Airflow + DBT + GCP

Project Structure

🛠️ Project Overview

✨ Features

Setup

Reference

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages