Skip to content

Commit

Permalink
Add files via upload
Browse files Browse the repository at this point in the history
  • Loading branch information
gititgrrrl authored Jun 18, 2024
1 parent 056965c commit 8bb6a96
Showing 1 changed file with 289 additions and 0 deletions.
289 changes: 289 additions & 0 deletions rmc_final_project_instructions.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,289 @@
---
title: "R MasterClass Final Project: Global Statistics Dashboard"
format:
html:
theme: cosmo
toc: true
toc_depth: 3
---

```{r echo = F, message = F}
# Load packages
if(!require(pacman)) install.packages("pacman")
pacman::p_load(tidyverse,
reactable)
```

# Introduction

This project is a culmination of the skills and knowledge you've gained throughout the course. Your task is to create an interactive dashboard published to GitHub Pages, showcasing global statistics in a visually engaging way. This dashboard will utilize data from sources like the Gapminder Foundation, incorporating advanced interactive visualizations `{plotly}` and `{reactable}`.

The dashboard

See an example of a well-executed project [here](link-to-example-dashboard).

# Data Requirements and Acquisition

## Data Sources

Choose a dataset on global indicators from [Gapminder's data repository](https://gapminder.org/data). Gapminder provides a vast repository of country-level statistics on a wide range of topics, including but not limited to:

- **Health** (e.g., disease incidence, maternal health, mortality rates)
- **Education** (e.g., literacy rates, school enrollment rates)
- **Economy** (e.g., income inequality, poverty rates, employment rates)
- **Environment** (e.g., CO2 emissions, energy consumption)
- **Gender** (e.g., gender equality, female labor force participation)

The data is available to download in CSV files which you can then import, clean, and analyze.

## Exploring Gapminder datasets

1. Visit the [Gapminder's data repository](https://gapminder.org/data) and browse the indicators from the dropdown menu. You can use the search box to look for topics of interest.

![]()![](images/gapminder_data_menu-01.png)

2. Preview the spreadsheet and read more about the indicator. Ensure the dataset is relatively complete with minimal missing entries, especially in recent years. You can click on the icons in "VIEW AS: 🎈〽️" to visualize the data as a bubble plot or lin

![](images/gapminder_data_preview.png)

3. You can also create exploratory visualizations of any indicator with [Gapminder tools](https://www.gapminder.org/tools/). By default, it shows bubble plot of Life expectancy vs. GDP per capita, sized by Population, but you can customize the plot by choosing different indicators or different types of visualizations

![](images/gapminder_tools_menu.png)

You can choose different plot types from the dropdown menu at the top left of the page. Maps, Trends, and Ranks are particularly useful visualizations.

![](images/gapminder_tools_chart_types.png)

## Selecting your Data

Here are a few steps and considerations for selecting your data:

- **Comprehensiveness**: Check for data completeness. Ideally should have at least 10 years with minimal missing entries.

- **Relevance**: Choose data that is up-to-date. Avoid datasets with outdated statistics (e.g., malaria case data is only recorded until 2006).

- **Relationships**: While we require you to choose only one indicator, consider analyzing relationships between two or more indicators. For example, you could compare trends in sanitation levels with child mortality, or how TB incidence correlates with HIV incidence.

## Downloading the Data

Choose indicators that appeal to you and download the data in CSV format using the "DOWNLOAD AS: ⏬CSV" option.

Gapminder also provides country metadata [here](https://docs.google.com/spreadsheets/d/1qHalit8sXC0R8oVXibc2wa2gY7bkwGzOybEMTWp-08o/edit). This contains useful variables you may want to join with your indicator dataset. Key variables of interest might be:

- **Country codes**: Standardized 3-letter country codes, same as ISO. Useful for joining with other datasets.

- **Regions**: Geographic divisions to group and summarize by. Useful for comparing indicators across continents.

- **Income groups**: Can be converted to an ordered factor variable for visualizing relationships between income level your indicator.

## Additional Data Sources

While Gapminder is recommended for its ease of access and breadth of indicators, you are welcome to to explore other datasets that might offer richer details. For example, the [WHO World Malaria Report](https://www.who.int/teams/global-malaria-programme/reports/world-malaria-report-2023) provides data on malaria cases and deaths from 2010-2022, available for download as Excel files.

Other websites to explore are:

- [World Bank Open Data](http://data.worldbank.org/)

- [IHME \| GHDx](https://ghdx.healthdata.org/series/global-burden-disease-gbd)

These sources may require more complex data cleaning processes.

# Data Cleaning and Preparation

## Pivoting Data

Gapminder indicator datasets are provided in a wide format with one row per country and columns representing years. You will need to pivot this data from wide to long format for easier filtering, grouping, and plotting.

## Numeric Conversion

Convert string representations of numbers (like "20k" or "2M") into actual numeric values using the `{stringr}` package.

```{r}
# Demo of how to convert values with string suffixes to numeric
x <- c("12k", "11k", "8900", "8400", "11k", "10k")
x %>%
str_replace_all("k", "e3") %>%
str_replace_all("M", "e6") %>%
as.numeric()
```

## Country Name Standardization

Use the `{countrycode}` package to align country names with their ISO codes, ensuring accurate merges with geospatial data.

Alternatively, you can join your Gapminder indicator data with the Gapminder geographic metadata by country name, and use the `geo` column as the ISO code. The country names in these two datasets will match exactly, since they are both compiled by Gapminder.

## Adding Country Polygons

Use the `{rnaturalearth}` package to download the country polygons. This data provides the geographical shapes necessary for plotting your world map.

After aligning the country names or ISO codes between your datasets, merge the Gapminder data with the country polygons.

# Dashboard Creation Instructions

## Project Repository Structure

Organize your project repository as follows:

- `_.Rproj`: Rstudio project file.

- `_.qmd`: Main project dashboard.

- `_.html`: Rendered HTML dashboard.

- `/data`: Data folder

- `/data/*.csv`: Your dataset(s) in CSV format

- `/data/README.md`: Metadata about your dataset including information on provenance, codebook, variable definitions, etc.

- `/images`: Images folder

## Quarto Setup

Create your Quarto project and choose appropriate document options, defining the `title` and `author` for the navigation bar as well as specifying the use of the `dashboard` format.

Optionally, you can also include a `logo` and one or more `nav-buttons`.

```
---
title: "DASHBOARD TITLE"
author: "YOUR NAME"
format:
dashboard:
logo: images/LOGO_IMAGE.png
nav-buttons: [github]
github: https://github.com/YOUR_URL
theme: lux
execute:
echo: false
warning: false
message: false
---
```

Set up your environment with the required libraries:

```{r}
# Load packages
if(!require(pacman)) install.packages("pacman")
pacman::p_load(tidyverse,
here,
sf,
bslib,
bsicons,
rnaturalearth,
plotly,
countrycode,
htmltools,
reactable,
janitor
)
```

## Dashboard Layout

Create a multi-page dashboard using Quarto. Divide each page into sections using headings, and organize your content into rows and/or columns to create a visually appealing layout.

**Requirements:**

- At least 2 pages
- At least 8 elements (value boxes, tables, plots, or other interactive visualizations) in total.

## Dashboard Features

Your dashboard should not only be informative but also engaging and easy to navigate. Incorporate various Quarto features to enhance user experience, such as:

- **Statistical Highlights**: Use value boxes to display key statistics, such as the highest and lowest values for the selected indicator, or significant year-on-year changes. Highlight interesting geographical trends, such as a country that deviates significantly from regional norms.

- **Professional Aesthetics**: Employ custom themes and color palettes to make the visual presentation as professional as possible.

## Visualization Requirements

- **World Map**: Use `{ggplot2}` and the `ggploty()` funtion from `{plotly}` to create an interactive choropleth or dot map that allows users to explore your chosen indicator by country and year.

- **Additional Visualizations**: At least two other interactive charts or tables, such as scatter plots of GDP vs. health indicators or box plots segmented by continent.

- Customize visual aesthetics significantly beyond the defaults to ensure a professional appearance. Each visualization should be accompanied by a descriptive title, clear legends, and annotated axes.

- **Yearly Data Interaction**: Consider implementing a slider to allow viewers to see changes over time on the map. You can do this by adding a frame aesthetic to ggplot, and allows interactive, linked views of a series of frames over time.

```{r}
gg <- gapminder::gapminder %>%
ggplot(aes(x = gdpPercap, y = lifeExp, color = continent,
frame = year)) +
geom_point() +
scale_x_log10() +
theme_minimal()
ggplotly(gg)
```

# Deploying to GitHub Pages

You should deploy your dashboard to GitHub Pages for easy access and sharing. Consult the lesson on [Deploying Dashboards with Quarto](https://thegraphcourses.org/courses/rmc-q2-2024/topics/dashboards-with-quarto/) for instructions on how to set up your GitHub repository and deploy your dashboard.

# Grading Rubric

## Data Acquisition and Preparation (20 points)

- **Selection of Relevant and Comprehensive Dataset**:
- Is the chosen dataset relevant and comprehensive for the project's objectives?
- **Data Cleaning and Formatting**:
- Is the data properly cleaned and formatted where necessary?
- **Joining with Additional Data Sources**:
- Are additional data sources integrated effectively, if necessary?
- **Country Name Standardization**:
- Where relevant, is country name standardization done correctly to avoid data loss during joins?

## Dashboard Design and Layout (15 points)

- **Overall Aesthetics and Professional Appearance**:
- Does the dashboard have a professional appearance, with a consistent theme and color palette?
- **Effective Use of Dashboard Features**:
- Are a variety of dashboard features like value boxes and interactive elements used effectively?
- **Appropriate Use of Section Headings and Layout**:
- Is the dashboard organized with appropriate section headings and a logical layout?

## Visualization Quality and Complexity (25 points)

- **Requirement Fulfillment**:
- Are the requirements of at least 8 elements (value boxes, plots, and tables) met?
- **Clarity of Visualizations**:
- Are the visualizations/tables clear and easy to understand?
- Do they effectively communicate the desired insights/statistics?
- **Customization**:
- Are the visualizations/tables well-customized, with clear titles and labels where relevant?
- **Advanced Features**:
- Does the dashboard use at least one advanced feature, such as hover-over text, sliders or other {plotly} or {highcharter} features?

## Documentation and Deployment (10 points)

- **Well-Structured Repository**:
- Is the repository well-structured with appropriate folders?
- Are the data files and code files titled clearly and easy to understand?
- **Commented Code**:
- Is the code appropriately commented?
- **Data Source Documentation**:
- Is there appropriate documentation of data sources and data processing in the dashboard or README file?
- **Successful Deployment**:
- Is the dashboard deployed correctly to GitHub Pages?

## Creativity and Insightfulness (10 points)

- **Unique and Creative Approaches**:
- Does the project demonstrate originality and creativity in data visualization techniques and storytelling?
- **Insightful Visualization of Statistics and Patterns**:
- Does the work reveal insightful and interesting statistics and patterns?

# Timeline and Deadlines

- **Data Workshop**: June 21, 2024.
- **Preliminary Peer Review**: June 28, 2024.
- **Final Submission**: July 12, 2024.

# Submission Instructions

Submit a ZIP file of your repository and a link to the deployed dashboard on GitHub Pages through the course webpage by July 12, 2024.

0 comments on commit 8bb6a96

Please sign in to comment.