Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CIA World Factbook from {usadatasets}. #761

Merged
merged 5 commits into from
Oct 20, 2024
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,7 @@ If you are using TidyTuesday to teach data-related skills, [please let us know](
| 40 | `2024-10-01` | [Chess Game Dataset (Lichess)](data/2024/2024-10-01/readme.md) | [Chess Game Dataset (Lichess)](https://www.kaggle.com/datasets/datasnaek/chess/data) | [Beginner's Guide: Data Visualization with Python](https://www.kaggle.com/code/batibayburak/beginner-s-guide-data-visualization-with-python) |
| 41 | `2024-10-08` | [National Park Species](data/2024/2024-10-08/readme.md) | [NPSpecies - The National Park Service biodiversity database](https://irma.nps.gov/npspecies/) | [NPSpecies with Julia & Tidier](https://github.com/frankiethull/NPSpecies/inst/examples/julia/NPSpecies_TidierOrg.md) |
| 42 | `2024-10-15` | [Southern Resident Killer Whale Encounters](data/2024/2024-10-15/readme.md) | [Center for Whale Research](https://www.whaleresearch.com/) | [Web Scraping & Mapping {orcas} Encounters](https://jadeynryan.github.io/orcas/) |
| 43 | `2024-10-22` | [The CIA World Factbook](data/2024/2024-10-22/readme.md) | [usdatasets R package](https://cran.r-project.org/package=usdatasets) | [The World Factbook](https://www.cia.gov/the-world-factbook/) |

***

Expand Down
260 changes: 260 additions & 0 deletions data/2024/2024-10-22/cia_factbook.csv

Large diffs are not rendered by default.

16 changes: 16 additions & 0 deletions data/2024/2024-10-22/meta.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
title: "The CIA World Factbook"
article:
title: "The World Factbook"
url: "https://www.cia.gov/the-world-factbook/"
data_source:
title: "usdatasets R package"
url: "https://cran.r-project.org/package=usdatasets"
images:
# Please include at least one image, and up to three images
- file: "world_factbook.png"
alt: >
The logo of the World Factbook, a blue globe with an eagle's head in white.
The eagle is wearing a monocle.
credit:
# We want to thank you for curating this dataset! If you do not want a
# particular type of credit, please delete the related line.
91 changes: 91 additions & 0 deletions data/2024/2024-10-22/readme.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,91 @@
# The CIA World Factbook

This week we're exploring the [CIA World Factbook](https://www.cia.gov/the-world-factbook/)!
The dataset comes from the [{usdatasets}](https://cran.r-project.org/package=usdatasets) R package via [this post on LinkedIn](https://www.linkedin.com/posts/andrescaceresrossi_rstats-rstudio-opensource-activity-7249513444830318592-r395).
jonthegeek marked this conversation as resolved.
Show resolved Hide resolved

> The *World Factbook* provides basic intelligence on the history, people, government,
> economy, energy, geography, environment, communications, transportation, military,
> terrorism, and transnational issues for 265 world entities.

Which countries have the highest number of internet users per square kilometer?
Which countries have the highest percentage of internet users?

You might want to join this dataset with past TidyTueday datasets that featured country information!

```r
# pak::pak("r4ds/ttmeta")
library(tidyverse)
library(ttmeta)

country_datasets <- ttmeta::tt_datasets_metadata |>
dplyr::mutate(
has_country = purrr::map_lgl(
.data$variable_details,
\(var_dets) {
!is.null(var_dets) &&
any(stringr::str_detect(tolower(var_dets$variable), "country"))
}
)
) |>
dplyr::filter(has_country)
```

## The Data

```r
# Option 1: tidytuesdayR package
## install.packages("tidytuesdayR")

tuesdata <- tidytuesdayR::tt_load('2024-10-22')
## OR
tuesdata <- tidytuesdayR::tt_load(2024, week = 43)

cia_factbook <- tuesdata$cia_factbook

# Option 2: Read directly from GitHub

cia_factbook <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2024/2024-10-22/cia_factbook.csv')
```

## How to Participate

- [Explore the data](https://r4ds.hadley.nz/), watching out for interesting relationships. We would like to emphasize that you should not draw conclusions about **causation** in the data. There are various moderating variables that affect all data, many of which might not have been captured in these datasets. As such, our suggestion is to use the data provided to practice your data tidying and plotting techniques, and to consider for yourself what nuances might underlie these relationships.
- Create a visualization, a model, a [shiny app](https://shiny.posit.co/), or some other piece of data-science-related output, using R or another programming language.
- [Share your output and the code used to generate it](../../../sharing.md) on social media with the #TidyTuesday hashtag.
- [Submit your own dataset!](../../../.github/pr_instructions.md)

### Data Dictionary

# `cia_factbook.csv`

|variable |class |description |
|:-----------------------|:-------|:-------------------------------------|
|country |integer |Name of the country (factor with 259 levels). |
|area |integer |Total area of the country (in square kilometers). |
|birth_rate |double |Birth rate (number of live births per 1,000 people). |
|death_rate |double |Death rate (number of deaths per 1,000 people). |
|infant_mortality_rate |double |Infant mortality rate (number of deaths of infants under one year old per 1,000 live births). |
|internet_users |integer |Number of internet users. |
|life_exp_at_birth |double |Life expectancy at birth (in years). |
|maternal_mortality_rate |integer |Maternal mortality rate (number of maternal deaths per 100,000 live births). |
|net_migration_rate |double |Net migration rate (number of migrants per 1,000 people). |
|population |integer |Total population of the country. |
|population_growth_rate |double |Population growth rate (multiplier). |

### Cleaning Script

```r
# Mostly clean data provided by the {usdatasets} R package
# (https://cran.r-project.org/package=usdatasets). No cleaning was necessary.

# pak::pak("usdatasets")
library(dplyr)
library(usdatasets)
cia_factbook <- usdatasets::cia_factbook_tbl_df |>
dplyr::mutate(
dplyr::across(
c("area", "internet_users"),
as.integer
)
)
```
Binary file added data/2024/2024-10-22/world_factbook.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions data/2024/readme.md
Original file line number Diff line number Diff line change
Expand Up @@ -46,3 +46,4 @@ Archive of datasets and articles from the 2024 series of `#TidyTuesday` events.
| 40 | `2024-10-01` | [Chess Game Dataset (Lichess)](2024-10-01/readme.md) | [Chess Game Dataset (Lichess)](https://www.kaggle.com/datasets/datasnaek/chess/data) | [Beginner's Guide: Data Visualization with Python](https://www.kaggle.com/code/batibayburak/beginner-s-guide-data-visualization-with-python) |
| 41 | `2024-10-08` | [National Park Species](2024-10-08/readme.md) | [NPSpecies - The National Park Service biodiversity database](https://irma.nps.gov/npspecies/) | [NPSpecies with Julia & Tidier](https://github.com/frankiethull/NPSpecies/inst/examples/julia/NPSpecies_TidierOrg.md) |
| 42 | `2024-10-15` | [Southern Resident Killer Whale Encounters](2024-10-15/readme.md) | [Center for Whale Research](https://www.whaleresearch.com/) | [Web Scraping & Mapping {orcas} Encounters](https://jadeynryan.github.io/orcas/) |
| 43 | `2024-10-22` | [The CIA World Factbook](2024-10-22/readme.md) | [usdatasets R package](https://cran.r-project.org/package=usdatasets) | [The World Factbook](https://www.cia.gov/the-world-factbook/) |
1 change: 1 addition & 0 deletions static/tt_data_type.csv
Original file line number Diff line number Diff line change
@@ -1,4 +1,5 @@
Week,Date,year,data_files,data_type,delim
43,2024-10-22,2024,cia_factbook.csv,csv,","
42,2024-10-15,2024,orcas.csv,csv,","
41,2024-10-08,2024,most_visited_nps_species_data.csv,csv,","
40,2024-10-01,2024,chess.csv,csv,","
Expand Down