Skip to content

Guided Project using PySpark for Data-Analysis from Coursera.

Notifications You must be signed in to change notification settings

AndreBluhm/Project_Data-Analysis-PySpark

Repository files navigation

Project: Cleaning and Exploring Big Data using PySpark

Alt Image text

  • Task 1 - Install Spark on Google Colab and load datasets in PySpark
  • Task 2 - Change column datatype, remove whitespaces and drop duplicates
  • Task 3 - Remove columns with Null values higher than a threshold
  • Task 4 - Group, aggregate and create pivot tables
  • Task 5 - Rename categories and impute missing numeric values
  • Task 6 - Create visualizations to gather insights

About

Guided Project using PySpark for Data-Analysis from Coursera.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published