Skip to content

This data pipeline automates the process of extracting **JSON** data from Spotify API , transforming/flattening it, and loading it into a Snowflake data warehouse in tabular formate.

Notifications You must be signed in to change notification settings

isahillohiya/spotify_ETL_pipeline

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Data Pipeline : Spotify API To Snowflake Data Warehouse

This data pipeline automates the process of extracting JSON data from Spotify API , transforming/flattening it, and loading it into a Snowflake data warehouse in tabular formate.

"Data Pipeline Architecture"

Description:

The pipeline leverages AWS services to achieve the following:

  • Schedules a weekly CloudWatch event to trigger the pipeline.
  • Extracts data from Spotify API using a Lambda function.
  • Stores the extracted data in an S3 bucket.
  • Transforms the data using an AWS Glue job.
  • Loads the transformed data into Snowflake using Snowpipe.

Running the Pipeline:

  • The CloudWatch event triggers the Lambda function, initiating the data extraction and transformation process.
  • Snowpipe automatically loads the transformed data into Snowflake upon detection.

Data Model:

The Snowflake data warehouse uses a relational data model with the following tables:

"ER Diagram"

songs:

Column Name Data Type Description
SONG_ID varchar (primary key) Unique identifier for each song
SONG_NAME varchar Name of the song
DURATION_MS number(38,0) Duration of the song in milliseconds
URL varchar Web address of the song
ALBUM_ID varchar Foreign key referencing the albums table
ARTIST_ID varchar Foreign key referencing the artists table

artists:

Column Name Data Type Description
ARTIST_ID varchar (primary key) Unique identifier for each artist
ARTIST_NAME varchar Name of the artist
EXTERNAL_URL varchar Web address associated with the artist

albums:

Column Name Data Type Description
ALBUM_ID varchar (primary key) Unique identifier for each album
ALBUM_NAME varchar Name of the album
RELEASE_DATE DATE Date the album was released
TOTAL_TRACKS integer Number of tracks on the album
URL varchar Web address of the album

weekly_hits:

Column Name Data Type Description
id integer (primary key) Unique identifier for each entry
SONG_ID varchar Foreign key referencing the songs table
WEEKLY_RANK number Rank of the song for that week
POPULARITY number Song popularity score
SONG_ADDED date date when the song was added to the weekly list

Future Enhancement

☐ Adding CI/CD

☐ Creating presentation layer

☐ Visualizing data using BI tool like Looker or Power BI

About

This data pipeline automates the process of extracting **JSON** data from Spotify API , transforming/flattening it, and loading it into a Snowflake data warehouse in tabular formate.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published