Skip to content

marinaolina/SparkBatch

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 

Repository files navigation

Create simple spark batch ETL job that satisfies following points.
Assume that actual data volume will be several GBs per day

  1. Reads and parses Youtube trending video data from provided JSON files
  2. Extracts most viewed video per category id and per trending date
  3. Formats data to have required columns for analytics - video_id, trending_date, category_id, title, views, likes, dislikes
  4. Save data to partitioned table to be used in further analysis

About

Reads json, persists locally, some analytics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published