Skip to content

Latest commit

 

History

History
5 lines (3 loc) · 344 Bytes

README.md

File metadata and controls

5 lines (3 loc) · 344 Bytes

Python Web Scraping Pipeline Orchestrated With Airflow

This project creates a data pipeline that scraps podcast data into a Google Cloud SQL-managed Postgresql database. The Airflow-orchestrated pipeline also uploads the audio files of each podcast episode into a Google Cloud Storage bucket.

GCP resources are provisioned using Terraform.