Skip to content

uche-madu/podcast_scraping_pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Python Web Scraping Pipeline Orchestrated With Airflow

This project creates a data pipeline that scraps podcast data into a Google Cloud SQL-managed Postgresql database. The Airflow-orchestrated pipeline also uploads the audio files of each podcast episode into a Google Cloud Storage bucket.

GCP resources are provisioned using Terraform.

About

Python web scraping pipeline orchestrated with Airflow

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published