rfordatascience · jonthegeek · Oct 20, 2024 · Oct 20, 2024 · Oct 20, 2024 · Oct 20, 2024
diff --git a/README.md b/README.md
@@ -74,6 +74,7 @@ If you are using TidyTuesday to teach data-related skills, [please let us know](
 | 40 | `2024-10-01` | [Chess Game Dataset (Lichess)](data/2024/2024-10-01/readme.md) | [Chess Game Dataset (Lichess)](https://www.kaggle.com/datasets/datasnaek/chess/data) | [Beginner's Guide: Data Visualization with Python](https://www.kaggle.com/code/batibayburak/beginner-s-guide-data-visualization-with-python) | 
 | 41 | `2024-10-08` | [National Park Species](data/2024/2024-10-08/readme.md) | [NPSpecies - The National Park Service biodiversity database](https://irma.nps.gov/npspecies/) | [NPSpecies with Julia & Tidier](https://github.com/frankiethull/NPSpecies/inst/examples/julia/NPSpecies_TidierOrg.md) | 
 | 42 | `2024-10-15` | [Southern Resident Killer Whale Encounters](data/2024/2024-10-15/readme.md) | [Center for Whale Research](https://www.whaleresearch.com/) | [Web Scraping & Mapping {orcas} Encounters](https://jadeynryan.github.io/orcas/) | 
+| 43 | `2024-10-22` | [The CIA World Factbook](data/2024/2024-10-22/readme.md) | [usdatasets R package](https://cran.r-project.org/package=usdatasets) | [The World Factbook](https://www.cia.gov/the-world-factbook/) | 
 
 ***  
 

diff --git a/data/2024/2024-10-22/cia_factbook.csv b/data/2024/2024-10-22/cia_factbook.csv
diff --git a/data/2024/2024-10-22/meta.yaml b/data/2024/2024-10-22/meta.yaml
@@ -0,0 +1,16 @@
+title: "The CIA World Factbook"
+article:
+  title: "The World Factbook"
+  url: "https://www.cia.gov/the-world-factbook/"
+data_source:
+  title: "usdatasets R package"
+  url: "https://cran.r-project.org/package=usdatasets"
+images:
+# Please include at least one image, and up to three images
+- file: "world_factbook.png"
+  alt: >
+    The logo of the World Factbook, a blue globe with an eagle's head in white. 
+    The eagle is wearing a monocle.
+credit:
+# We want to thank you for curating this dataset! If you do not want a 
+# particular type of credit, please delete the related line.
diff --git a/data/2024/2024-10-22/readme.md b/data/2024/2024-10-22/readme.md
@@ -0,0 +1,91 @@
+# The CIA World Factbook
+
+This week we're exploring the [CIA World Factbook](https://www.cia.gov/the-world-factbook/)! 
+The dataset comes from the [{usdatasets}](https://cran.r-project.org/package=usdatasets) R package via [this post on LinkedIn](https://www.linkedin.com/posts/andrescaceresrossi_rstats-rstudio-opensource-activity-7249513444830318592-r395).
+
+> The *World Factbook* provides basic intelligence on the history, people, government, 
+> economy, energy, geography, environment, communications, transportation, military, 
+> terrorism, and transnational issues for 265 world entities.
+
+Which countries have the highest number of internet users per square kilometer?
+Which countries have the highest percentage of internet users?
+
+You might want to join this dataset with past TidyTueday datasets that featured country information!
+
+```r
+# pak::pak("r4ds/ttmeta")
+library(tidyverse)
+library(ttmeta)
+
+country_datasets <- ttmeta::tt_datasets_metadata |> 
+  dplyr::mutate(
+    has_country = purrr::map_lgl(
+      .data$variable_details,
+      \(var_dets) {
+        !is.null(var_dets) && 
+          any(stringr::str_detect(tolower(var_dets$variable), "country"))
+      }
+    )
+  ) |> 
+  dplyr::filter(has_country)
+```
+
+## The Data
+
+```r
+# Option 1: tidytuesdayR package 
+## install.packages("tidytuesdayR")
+
+tuesdata <- tidytuesdayR::tt_load('2024-10-22')
+## OR
+tuesdata <- tidytuesdayR::tt_load(2024, week = 43)
+
+cia_factbook <- tuesdata$cia_factbook
+
+# Option 2: Read directly from GitHub
+
+cia_factbook <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2024/2024-10-22/cia_factbook.csv')
+```
+
+## How to Participate
+
+- [Explore the data](https://r4ds.hadley.nz/), watching out for interesting relationships. We would like to emphasize that you should not draw conclusions about **causation** in the data. There are various moderating variables that affect all data, many of which might not have been captured in these datasets. As such, our suggestion is to use the data provided to practice your data tidying and plotting techniques, and to consider for yourself what nuances might underlie these relationships.
+- Create a visualization, a model, a [shiny app](https://shiny.posit.co/), or some other piece of data-science-related output, using R or another programming language.
+- [Share your output and the code used to generate it](../../../sharing.md) on social media with the #TidyTuesday hashtag.
+- [Submit your own dataset!](../../../.github/pr_instructions.md)
+
+### Data Dictionary
+
+# `cia_factbook.csv`
+
+|variable                |class   |description                           |
+|:-----------------------|:-------|:-------------------------------------|
+|country                 |integer |Name of the country (factor with 259 levels). |
+|area                    |integer |Total area of the country (in square kilometers). |
+|birth_rate              |double  |Birth rate (number of live births per 1,000 people). |
+|death_rate              |double  |Death rate (number of deaths per 1,000 people). |
+|infant_mortality_rate   |double  |Infant mortality rate (number of deaths of infants under one year old per 1,000 live births). |
+|internet_users          |integer |Number of internet users. |
+|life_exp_at_birth       |double  |Life expectancy at birth (in years). |
+|maternal_mortality_rate |integer |Maternal mortality rate (number of maternal deaths per 100,000 live births). |
+|net_migration_rate      |double  |Net migration rate (number of migrants per 1,000 people). |
+|population              |integer |Total population of the country. |
+|population_growth_rate  |double  |Population growth rate (multiplier). |
+
+### Cleaning Script
+
+```r
+# Mostly clean data provided by the {usdatasets} R package
+# (https://cran.r-project.org/package=usdatasets). No cleaning was necessary.
+
+# pak::pak("usdatasets")
+library(dplyr)
+library(usdatasets)
+cia_factbook <- usdatasets::cia_factbook_tbl_df |> 
+  dplyr::mutate(
+    dplyr::across(
+      c("area", "internet_users"),
+      as.integer
+    )
+  )
+```
diff --git a/data/2024/2024-10-22/world_factbook.png b/data/2024/2024-10-22/world_factbook.png
diff --git a/data/2024/readme.md b/data/2024/readme.md
@@ -46,3 +46,4 @@ Archive of datasets and articles from the 2024 series of `#TidyTuesday` events.
 | 40 | `2024-10-01` | [Chess Game Dataset (Lichess)](2024-10-01/readme.md) | [Chess Game Dataset (Lichess)](https://www.kaggle.com/datasets/datasnaek/chess/data) | [Beginner's Guide: Data Visualization with Python](https://www.kaggle.com/code/batibayburak/beginner-s-guide-data-visualization-with-python) |  
 | 41 | `2024-10-08` | [National Park Species](2024-10-08/readme.md) | [NPSpecies - The National Park Service biodiversity database](https://irma.nps.gov/npspecies/) | [NPSpecies with Julia & Tidier](https://github.com/frankiethull/NPSpecies/inst/examples/julia/NPSpecies_TidierOrg.md) |  
 | 42 | `2024-10-15` | [Southern Resident Killer Whale Encounters](2024-10-15/readme.md) | [Center for Whale Research](https://www.whaleresearch.com/) | [Web Scraping & Mapping {orcas} Encounters](https://jadeynryan.github.io/orcas/) |  
+| 43 | `2024-10-22` | [The CIA World Factbook](2024-10-22/readme.md) | [usdatasets R package](https://cran.r-project.org/package=usdatasets) | [The World Factbook](https://www.cia.gov/the-world-factbook/) |  
diff --git a/static/tt_data_type.csv b/static/tt_data_type.csv
@@ -1,4 +1,5 @@
 Week,Date,year,data_files,data_type,delim
+43,2024-10-22,2024,cia_factbook.csv,csv,","
 42,2024-10-15,2024,orcas.csv,csv,","
 41,2024-10-08,2024,most_visited_nps_species_data.csv,csv,","
 40,2024-10-01,2024,chess.csv,csv,","