Capstone project for the HarvardX Data Science program (completed in 2019). All data analysis was done in R.
This project seeks to explore the global progress towards the United Nations Sustainable Development Goals (UN SDG). The main dataset is the UN SDG Indicators from the UN Statistics Division. It is complimented by data from the Gapminder Foundation and the World Bank. The maps and mapdata R packages are used to visualize the reported progress for each country on a world map.
- Programmatically download data
- Clean and merge data into unified datasets
- Communicate results graphically
- Use trend analysis to predict future results
Data | Source |
---|---|
UN Sustainable Development Goals Indicators | bigrquery (R interface to BigQuery) |
Gapminder geography data | googlesheets |
World Bank population data | wbstats (R interface to World Bank API) |
Spatial map data | maps and mapdata |
The analysis process was broken down into:
- Downloading data
- Cleaning and merging data
- Exploring and visualizing data
- Forecasting
See the PDF report for details and results.