Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Preview 27 #28

Closed
wants to merge 6 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions check_reports/spell_check_results.tsv
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
word file lines
CalEnviroScreen Data_Summarization_Lab_Key.Rmd 16
CalEnviroScreen Data_Summarization_Lab.Rmd 16
ces Data_Summarization_Lab_Key.Rmd 134
ces Data_Summarization_Lab.Rmd 114
CES Data_Summarization_Lab_Key.Rmd 160
CES Data_Summarization_Lab.Rmd 137
daseh index.Rmd 65
fredhutch index.Rmd 65
5 changes: 5 additions & 0 deletions check_reports/url_checks.tsv
Original file line number Diff line number Diff line change
@@ -0,0 +1,5 @@
urls urls_status file
https://daseh.org/data/CalEnvironmentalScreen_data.csv failed /github/workspace/modules/Data_Summarization/lab/Data_Summarization_Lab_Key.Rmd
https://daseh.org/data/CalEnvironmentalScreen_data.csv failed /github/workspace/modules/Data_Summarization/lab/Data_Summarization_Lab_Key.Rmd
https://daseh.org/data/CalEnvironmentalScreen_data.csv failed /github/workspace/modules/Data_Summarization/lab/Data_Summarization_Lab.Rmd
https://daseh.org/data/CalEnvironmentalScreen_data.csv failed /github/workspace/modules/Data_Summarization/lab/Data_Summarization_Lab.Rmd
55 changes: 34 additions & 21 deletions modules/Data_Summarization/lab/Data_Summarization_Lab.Rmd
Original file line number Diff line number Diff line change
Expand Up @@ -13,46 +13,48 @@ knitr::opts_chunk$set(echo = TRUE)

Data used

Bike Lanes Dataset: BikeBaltimore is the Department of Transportation's bike program.
The data is from http://data.baltimorecity.gov/Transportation/Bike-Lanes/xzfj-gyms
CalEnviroScreen Dataset: CalEnviroScreen is a project that ranks census tracts in California based on potential exposures to pollutants, adverse environmental conditions, socioeconomic factors and the prevalence of certain health conditions. Data used in the CalEnviroScreen model come from national and state sources.

The data is from https://calenviroscreen-oehha.hub.arcgis.com/#Data

You can Download as a CSV in your current working directory. Note its also available at: https://daseh.org/data/CalEnvironmentalScreen_data.csv

You can Download as a CSV in your current working directory. Note its also available at: https://daseh.org/data/Bike_Lanes.csv

```{r, echo = TRUE, message=FALSE, error = FALSE}
library(readr)
library(dplyr)
library(tidyverse)
library(jhur)
library(dasehr)

bike <- read_csv(file = "https://daseh.org/data/Bike_Lanes.csv")
ces <- read_csv(file = "https://daseh.org/data/CalEnvironmentalScreen_data.csv")
```

or use

```{r}
library(jhur)
bike <- read_bike()
library(dasehr)
ces <- read_ces()
```

### 1.1

How many bike "lanes" are currently in Baltimore? You can assume each observation/row is a different bike "lane". (hint: how do you get the number of rows of a data set? You can use `dim()` or `nrow()` or another function).
How many census tracts are in California? You can assume each observation/row is a different census tract. (hint: how do you get the number of rows of a data set? You can use `dim()` or `nrow()` or another function).

```{r 1.1response}

```

### 1.2

How many feet of bike "lanes" are currently in Baltimore, based on the `length` column? (use `sum()`)
What was the population of California in the 2010 census, based on the `TotalPop` column? (use `sum()`)

```{r 1.2response}

```

### 1.3

Summarize the data to get the `max` of `length` using the `summarize` function.
Summarize the data to get the `max` of `TotalPop` using the `summarize` function.

```
# General format
Expand All @@ -66,7 +68,7 @@ DATA_TIBBLE %>%

### 1.4

Modify your code from 1.3 to add the `min` of `length` using the `summarize` function.
Modify your code from 1.3 to add the `min` of `TotalPop` using the `summarize` function.

```
# General format
Expand All @@ -85,7 +87,7 @@ DATA_TIBBLE %>%

### P.1

Summarize the `bike` data to get the mean of `length` and `dateInstalled`. Make sure to remove `NA`s.
Summarize the `ces` data to get the mean of `TotalPop` and `Pesticides`. Make sure to remove `NA`s.

```
# General format
Expand All @@ -101,47 +103,58 @@ DATA_TIBBLE %>%

### P.2

You should have gotten a mean date sometime in the 1800s - that doesn't make much sense! Hypothesize why the average date is a date from before bike lanes were being built in Baltimore.
Given that parts of California are heavily agricultural, and the max value for the `Pesticides` variable is 80811, why might the average value be so low??

```{r P.2response}

```

### P.3

Filter any zeros out of `bike` `dateInstalled`. Use `filter()`. Assign this "cleaned" dataset object the name `bike_2`.
Filter any zeros out of `ces` `Pesticides`. Use `filter()`. Assign this "cleaned" dataset object the name `exurban_ces``.

(We are making the admittedly shaky assumption that places with no reported pesticide use are within cities.)

```
# General format
DATA_TIBBLE %>% filter(LOGICAL_COMPARISON)
```

```{r P.3response}
```{r P.3response_part1_part2}

```

How many census tracts have pesticide values greater than 0?

```{r P.3response}

```

# Part 2
# Part 2install

### 2.1

How many bike lanes are there in each type of lane? Use `count()` on the column named `type`. Use `bike` instead of `bike_2`.
The variable `CES4.0PercRange` categorizes the calculated CES4.0 value (a measure of the pollution burden in a particular region) into percentile ranges, grouped by 5% increments.

How many census tracts are there in each percentile range? Use `count()` on the column named `CES4.0PercRange`. Use `ces` instead of `exurban_ces`.

```{r 2.1response}

```

### 2.2

Modify your code from question 2.1 to break down each lane type by number of lanes. Use `count()` on the columns named `type` and `numLanes`.
Modify your code from question 2.1 to break down each percentile range by California county. Use `count()` on the columns named `CES4.0PercRange` and `CaliforniaCounty`.

```{r 2.2response}

```

Hmm. This isn't the easiest table to read. Let's try a different approach.

### 2.3

How many bike lanes are there in each type of lane? Use `group_by()`, `summarize()`, and `n()` on the column named `type`.
How many census tracts are there in each county? Use `group_by()`, `summarize()`, and `n()` on the column named `county`.

```
# General format
Expand All @@ -156,7 +169,7 @@ DATA_TIBBLE %>%

### 2.4

Modify your code from 2.3 to also group by `numLanes`.
Modify your code from 2.3 to also group by `CES4.0PercRange`.

```{r 2.4response}

Expand All @@ -167,7 +180,7 @@ Modify your code from 2.3 to also group by `numLanes`.

### P.4

Modify code from 2.3 to also summarize by longest average bike lane length? In your summarized output, make sure you call the new summarized average bike lane length variable (column name) "mean". In other words, the head of your output should look like:
Modify code from 2.3 to also summarize by largest average total population. In your summarized output, make sure you call the new summarized average total population variable (column name) "mean". In other words, the head of your output should look like:

```
# A tibble:
Expand Down
Loading
Loading