forked from dwillis/data_journalism_2025_spring
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathmy_class_reference.rmd
106 lines (82 loc) · 3.17 KB
/
my_class_reference.rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
---
title: "Class Reference"
author: "Student name"
output:
html_document:
theme: cerulean
highlight: pygments
toc: true
toc_float:
collapsed: true
smooth_scroll: false
---
```{r setup, include=FALSE}
knitr::opts_chunk$set(echo = TRUE)
```
## Introduction
Consider this a personal guide to the commands and functions you will learn. In general, when you come across an R command or function that you want to remember, put it in here along with a description of what it does and when you'd use it.
## Things I Learned on Jan. 27
The command to set a working directory is setwd(). For example:
```{r}
setwd("~/Documents/GitHub/data_journalism_2022_spring")
```
```{r}
install.packages("tidyverse")
library(tidyverse)
```
### Summarizing
I need to use group_by and summariz. Here's an example of grouping by county and calculating counts, sum and other descriptive statistics.
```{r}
ppp_maryland_loans %>%
group_by(project_county_name) %>%
summarise(
count_loans = n(),
total_loans_amount = sum(amount),
mean_loan_amount = mean(amount),
median_loan_amount = median(amount),
min_loan_amount = min(amount),
max_loan_amount = max(amount)
) %>%
arrange(desc(max_loan_amount))
```
Store of a new variable object
```{r}
"fatima" -> wv_winred_contribs %>%
```
### Lubridate - this package makes it easier to do the things R does with date-times and possible to do the things R does not.
### head is the title, summary is summary of data, colnames is the brief version of each column name, and glimpse is to get a brief sense of data.
```{r}
head(primary_18)
summary(primary_18)
colnames(primary_18)
glimpse(primary_18)
```
### How to add a new column based on an existing column. Run the following code to create a new column called `percent_election_day` based on a calculation using two existing columns.
```{r}
primary_18_with_election_day_percent <- primary_18 %>%
select(office, district, name_raw, party, jurisdiction, election_day, votes) %>%
mutate(
percent_election_day = election_day/votes
)
```
### The following code organize data by the new column.
```{r}
# better ordering?
primary_18 %>%
select(office, district, name_raw, party, jurisdiction, election_day, votes) %>%
mutate(
percent_election_day = (election_day/votes)*100
) %>% arrange(desc(percent_election_day))
```
### Mutate is also useful for standardizing data - for example, making different spellings of, say, cities into a single one.
You'll notice that there's a mix of styles: "Baltimore" and "BALTIMORE" for example. R will think those are two different cities, and that will mean that any aggregates we create based on city won't be accurate.
Mutate - it's not just for math! And a function called `str_to_upper` that will convert a character column into all uppercase. Now we can say exactly how many donations came from Baltimore (I mean, of course, BALTIMORE).
```{r}
standardized_maryland_cities <- maryland_cities %>%
mutate(
upper_city = str_to_upper(city)
)
```
### filter helps with reducing number of rows while select helps with reducing number of columns
### removing some objects from the workspace write the code down in console:
> rm(maryland_expenses)