- Task 1 - Install Spark on Google Colab and load datasets in PySpark
- Task 2 - Change column datatype, remove whitespaces and drop duplicates
- Task 3 - Remove columns with Null values higher than a threshold
- Task 4 - Group, aggregate and create pivot tables
- Task 5 - Rename categories and impute missing numeric values
- Task 6 - Create visualizations to gather insights
-
Notifications
You must be signed in to change notification settings - Fork 3
AndreBluhm/Project_Data-Analysis-PySpark
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
Guided Project using PySpark for Data-Analysis from Coursera.
Topics
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published