fhdsl · ehumph · May 15, 2024 · May 15, 2024 · May 15, 2024 · May 15, 2024
diff --git a/check_reports/spell_check_results.tsv b/check_reports/spell_check_results.tsv
@@ -0,0 +1,9 @@
+word	file	lines
+CalEnviroScreen	Data_Summarization_Lab_Key.Rmd	16
+CalEnviroScreen	Data_Summarization_Lab.Rmd	16
+ces	Data_Summarization_Lab_Key.Rmd	134
+ces	Data_Summarization_Lab.Rmd	114
+CES	Data_Summarization_Lab_Key.Rmd	160
+CES	Data_Summarization_Lab.Rmd	137
+daseh	index.Rmd	65
+fredhutch	index.Rmd	65
diff --git a/check_reports/url_checks.tsv b/check_reports/url_checks.tsv
@@ -0,0 +1,5 @@
+urls	urls_status	file
+https://daseh.org/data/CalEnvironmentalScreen_data.csv	failed	/github/workspace/modules/Data_Summarization/lab/Data_Summarization_Lab_Key.Rmd
+https://daseh.org/data/CalEnvironmentalScreen_data.csv	failed	/github/workspace/modules/Data_Summarization/lab/Data_Summarization_Lab_Key.Rmd
+https://daseh.org/data/CalEnvironmentalScreen_data.csv	failed	/github/workspace/modules/Data_Summarization/lab/Data_Summarization_Lab.Rmd
+https://daseh.org/data/CalEnvironmentalScreen_data.csv	failed	/github/workspace/modules/Data_Summarization/lab/Data_Summarization_Lab.Rmd
diff --git a/modules/Data_Summarization/lab/Data_Summarization_Lab.Rmd b/modules/Data_Summarization/lab/Data_Summarization_Lab.Rmd
@@ -13,46 +13,48 @@ knitr::opts_chunk$set(echo = TRUE)
 
 Data used
 
-Bike Lanes Dataset: BikeBaltimore is the Department of Transportation's bike program. 
-The data is from http://data.baltimorecity.gov/Transportation/Bike-Lanes/xzfj-gyms
+CalEnviroScreen Dataset: CalEnviroScreen is a project that ranks census tracts in California based on potential exposures to pollutants, adverse environmental conditions, socioeconomic factors and the prevalence of certain health conditions. Data used in the CalEnviroScreen model come from national and state sources.
+
+The data is from https://calenviroscreen-oehha.hub.arcgis.com/#Data
+
+You can Download as a CSV in your current working directory.  Note its also available at: 	https://daseh.org/data/CalEnvironmentalScreen_data.csv 
 
-You can Download as a CSV in your current working directory.  Note its also available at: 	https://daseh.org/data/Bike_Lanes.csv 
 
 ```{r, echo = TRUE, message=FALSE, error = FALSE}
 library(readr)
 library(dplyr)
 library(tidyverse)
-library(jhur)
+library(dasehr)
 
-bike <- read_csv(file = "https://daseh.org/data/Bike_Lanes.csv")
+ces <- read_csv(file = "https://daseh.org/data/CalEnvironmentalScreen_data.csv")
 ```
 
 or use 
 
 ```{r}
-library(jhur)
-bike <- read_bike()
+library(dasehr)
+ces <- read_ces()
 ```
 
 ### 1.1 
 
-How many bike "lanes" are currently in Baltimore?  You can assume each observation/row is a different bike "lane".  (hint: how do you get the number of rows of a data set? You can use `dim()` or `nrow()` or another function).
+How many census tracts are in California?  You can assume each observation/row is a different census tract.  (hint: how do you get the number of rows of a data set? You can use `dim()` or `nrow()` or another function).
 
 ```{r 1.1response}
 
 ```
 
 ### 1.2
 
-How many feet of bike "lanes" are currently in Baltimore, based on the `length` column? (use `sum()`)
+What was the population of California in the 2010 census, based on the `TotalPop` column? (use `sum()`)
 
 ```{r 1.2response}
 
 ```
 
 ### 1.3
 
-Summarize the data to get the `max` of `length` using the `summarize` function.
+Summarize the data to get the `max` of `TotalPop` using the `summarize` function.
 
 ```
 # General format 
@@ -66,7 +68,7 @@ DATA_TIBBLE %>%
 
 ### 1.4
 
-Modify your code from 1.3 to add the `min` of `length` using the `summarize` function.
+Modify your code from 1.3 to add the `min` of `TotalPop` using the `summarize` function.
 
 ```
 # General format 
@@ -85,7 +87,7 @@ DATA_TIBBLE %>%
 
 ### P.1
 
-Summarize the `bike` data to get the mean of `length` and `dateInstalled`. Make sure to remove `NA`s.
+Summarize the `ces` data to get the mean of `TotalPop` and `Pesticides`. Make sure to remove `NA`s.
 
 ```
 # General format 
@@ -101,47 +103,58 @@ DATA_TIBBLE %>%
 
 ### P.2
 
-You should have gotten a mean date sometime in the 1800s - that doesn't make much sense! Hypothesize why the average date is a date from before bike lanes were being built in Baltimore.
+Given that parts of California are heavily agricultural, and the max value for the `Pesticides` variable is 80811, why might the average value be so low??
 
 ```{r P.2response}
 
 ```
 
 ### P.3
 
-Filter any zeros out of `bike` `dateInstalled`. Use `filter()`. Assign this "cleaned" dataset object the name `bike_2`.
+Filter any zeros out of `ces` `Pesticides`. Use `filter()`. Assign this "cleaned" dataset object the name `exurban_ces``.
+
+(We are making the admittedly shaky assumption that places with no reported pesticide use are within cities.)
 
 ```
 # General format 
 DATA_TIBBLE %>% filter(LOGICAL_COMPARISON)
 ```
 
-```{r P.3response}
+```{r P.3response_part1_part2}
 
 ```
 
+How many census tracts have pesticide values greater than 0?
+
+```{r P.3response}
+
+```
 
-# Part 2
+# Part 2install
 
 ### 2.1
 
-How many bike lanes are there in each type of lane? Use `count()` on the column named `type`. Use `bike` instead of `bike_2`.
+The variable `CES4.0PercRange` categorizes the calculated CES4.0 value (a measure of the pollution burden in a particular region) into percentile ranges, grouped by 5% increments.
+
+How many census tracts are there in each percentile range? Use `count()` on the column named `CES4.0PercRange`. Use `ces` instead of `exurban_ces`.
 
 ```{r 2.1response}
 
 ```
 
 ### 2.2
 
-Modify your code from question 2.1 to break down each lane type by number of lanes. Use `count()` on the columns named `type` and `numLanes`.
+Modify your code from question 2.1 to break down each percentile range by California county. Use `count()` on the columns named `CES4.0PercRange` and `CaliforniaCounty`.
 
 ```{r 2.2response}
 
 ```
 
+Hmm. This isn't the easiest table to read. Let's try a different approach.
+
 ### 2.3
 
-How many bike lanes are there in each type of lane? Use `group_by()`, `summarize()`, and `n()` on the column named `type`.
+How many census tracts are there in each county? Use `group_by()`, `summarize()`, and `n()` on the column named `county`.
 
 ```
 # General format 
@@ -156,7 +169,7 @@ DATA_TIBBLE %>%
 
 ### 2.4
 
-Modify your code from 2.3 to also group by `numLanes`.
+Modify your code from 2.3 to also group by `CES4.0PercRange`.
 
 ```{r 2.4response}
 
@@ -167,7 +180,7 @@ Modify your code from 2.3 to also group by `numLanes`.
 
 ### P.4
 
-Modify code from 2.3 to also summarize by longest average bike lane length? In your summarized output, make sure you call the new summarized average bike lane length variable (column name) "mean". In other words, the head of your output should look like:
+Modify code from 2.3 to also summarize by largest average total population. In your summarized output, make sure you call the new summarized average total population variable (column name) "mean". In other words, the head of your output should look like:
 
 ```
 # A tibble: