-
Notifications
You must be signed in to change notification settings - Fork 0
/
using-rnoaa.Rmd
246 lines (204 loc) · 9.46 KB
/
using-rnoaa.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
---
title: "Introduction to the rnoaa package"
author: "Cesar R. Urteaga-Reyesvera"
date: "2017-09-11"
output: rmarkdown::html_vignette
vignette: >
%\VignetteIndexEntry{Introduction to the rnoaa package}
%\VignetteEngine{knitr::rmarkdown}
%\VignetteEncoding{UTF-8}
---
```{r Setup, include = FALSE}
knitr::opts_chunk$set(# Collapse output blocks
collapse = TRUE,
comment = "#>",
fig.width = 7,
fig.height = 7,
fig.align = "center",
warning = FALSE)
```
This vignette describes what is the `rnoaa` package and how it should be used.
## Description
The `rnoaa` package bundles a set of functions to get, clean, visualize, and analyze the data from the [NOAA's Significant Earthquake dataset](https://www.ngdc.noaa.gov/nndc/struts/form?t=101650&s=1&d=1).
## Usage
### Tidying the data
#### Obtaining the data
The function `get_earthquake_data` allows you to load a snapshot of the earthquakes' data downloaded on September 10, 2017; on the proviso that you may want to download the latest data from the NOAA's Website, you can use the `download_earthquake_data` function (requires internet access).
```{r}
# Extracts the data from a compressed file.
library(rnoaa)
raw_data <- get_earthquake_data()
```
### Data processing
Once the data was downloaded, you may wish to prepare some key variables for data analysis; in particular, you could be interested in the quakes' date, location of the epicenter, magnitude, and the human cost of lives. Moreover, the name of the place in which the quake has taken place could be of your concern. So, `eq_clean_data` and `eq_create_label` functions allows you to mop these variables up for analysis.
The `eq_clean_data` function creates the quakes' dates based on the three time variables within the NOAA's dataset: `YEAR`, `MONTH`, and `DAY`. Specifically, it can create the dates for earthquakes that had occurred Before the Common Era (B.C.E.); this enhances the R's functionality because it does not handle these dates. In addition, when the month or/and day is/are missing, the date is approximated at the midpoint of the period.
On the other hand, the `eq_location_clean` function removes the country from the variable that has the quake's location (i.e., `LOCATION_NAME` variable) due to it does not make sense to keep it since a variable with the country (i.e., `COUNTRY` variable) already exists.
```{r, message=FALSE}
library(dplyr)
# Before the data has been processed:
set.seed(11)
raw_data %>%
select(YEAR, MONTH, DAY, COUNTRY, LOCATION_NAME) %>%
sample_n(6)
# Tidies the data up for analysis.
clean_data <- eq_clean_data(raw_data)
clean_data <- eq_location_clean(clean_data)
# After the data has been processed (note that the DATE variable has been
# created and the country has been removed for the LOCATION_NAME variable):
set.seed(11)
clean_data %>%
select(YEAR, MONTH, DAY, DATE, COUNTRY, LOCATION_NAME) %>%
sample_n(6)
# N.B.: When the month or/and day is/are missing, the date is approximated at
# the midpoint of the period.
```
### Analyzing the data
#### Visualizing the data with ggplot2
One of the features of the `rnoaa` package is that has two ggplot2's geoms which assist you to visualize the timeline in which the earthquakes have ocurred: `geom_timeline` and `geom_timeline_label`.
The `geom_timeline` displays each observation in a timeline to visualize the dates in which the quakes have played out. Due to this geom inherits its attributes from the class of `geom_point`, you can use the aesthetics of `size` and `color` to display some other earthquake's traits such as magnitude and impact of human casualties.
In addition, the package provides a new ggplot2's theme (i.e., `theme_timeline`) so as to make clearer these geoms.
```{r, fig.height = 4, message = FALSE}
library(ggplot2)
# Timeline of earthquakes occurred as of 2000 in Chile.
clean_data %>%
filter(COUNTRY == "CHILE",
!is.na(EQ_PRIMARY),
YEAR %in% 2000:2016) %>%
ggplot(mapping = aes(x = DATE,
size = EQ_PRIMARY,
color = TOTAL_DEATHS / 100)
) +
geom_timeline() +
labs(size = "Richter scale value",
color = "# deaths in hundreds",
y = "") +
theme_timeline()
```
Note that the `y` aesthetic is not required, unless you may want to display two or more timelines, in which case it permits you to visualize the quakes for different countries.
```{r, fig.height = 5}
# Quakes in 2017
clean_data %>%
filter(!is.na(EQ_PRIMARY),
YEAR == 2017) %>%
ggplot(mapping = aes(x = DATE,
y = COUNTRY,
size = EQ_PRIMARY,
color = TOTAL_DEATHS,
label = LOCATION_NAME)
) +
geom_timeline() +
labs(size = "Richter scale value",
color = "# deaths",
y = "") +
theme_timeline()
```
If you need to identify the most powerful earthquakes, you can use the `geom_timeline_label` along with the `n_max` parameter. It labels the quakes with greater intensity in case you have provided the `size` aesthetic.
```{r, fig.height = 5}
# Quakes in Mexico from 1998 to 2005
clean_data %>%
filter(COUNTRY == "MEXICO",
!is.na(EQ_PRIMARY),
YEAR %in% 1998:2005) %>%
ggplot(mapping = aes(x = DATE,
size = EQ_PRIMARY,
color = TOTAL_DEATHS,
label = LOCATION_NAME)
) +
geom_timeline() +
# We label the five biggest quakes in magnitude.
geom_timeline_label(n_max = 5,
line_height = 1 / 5,
fontsize = 2.7) +
labs(size = "Richter scale value",
color = "# deaths",
y = "") +
theme_timeline()
```
N.B.: The `size` aesthetic is not required. Had it been omitted, the geom would label the last observations:
```{r, fig.height = 5}
# Quakes in Mexico from 1998 to 2005
clean_data %>%
filter(COUNTRY == "MEXICO",
!is.na(EQ_PRIMARY),
YEAR %in% 1998:2005) %>%
ggplot(mapping = aes(x = DATE,
color = TOTAL_DEATHS,
label = LOCATION_NAME)
) +
geom_timeline() +
# Were n_max omitted, it would display the three last observations.
# (i.e., the default).
geom_timeline_label(n_max = 5,
line_height = 1 / 5,
fontsize = 2.7) +
labs(color = "# deaths",
y = "") +
theme_timeline()
```
The parameter `line_height` controls the way how the vertical lines of the text labels are displayed, and since it is proportional to the available space, it avoids the overlapping with other levels (e.g., countries). In addition, you can use the following parameters to change the appearance of the text labels: `angle`, `line_height`, and `fontsize`.
```{r}
# Quakes observed from 1982 to 1983.
clean_data %>%
filter(!is.na(EQ_PRIMARY),
YEAR %in% 1982:1983) %>%
ggplot(mapping = aes(x = DATE,
y = COUNTRY,
size = EQ_PRIMARY,
color = TOTAL_DEATHS,
label = LOCATION_NAME)
) +
geom_timeline() +
# We label the most powerful quakes for each country.
geom_timeline_label(n_max = 1,
angle = 0,
line_height = 1 / 5,
fontsize = 2.7) +
labs(size = "Richter scale value",
color = "# deaths",
y = "") +
theme_timeline()
```
As a result of using the package to create the quakes' dates, we can visualize the earthquakes that had ocurred B.C.E.
```{r}
# Quakes observed B.C.E.
clean_data %>%
filter(!is.na(EQ_PRIMARY),
YEAR < 0) %>%
ggplot(mapping = aes(x = DATE,
y = COUNTRY,
size = EQ_PRIMARY,
color = TOTAL_DEATHS / 1000,
label = LOCATION_NAME)
) +
geom_timeline() +
# We label the most powerful quakes for each country.
geom_timeline_label(n_max = 1,
angle = 0,
line_height = 1 / 5,
fontsize = 2.7) +
labs(size = "Richter scale value",
color = "# deaths in thousands",
y = "") +
theme_timeline()
```
#### Visualizing the data with R's leaflet maps
`rnoaa` not only uses ggplot2's geoms but also R's leaflet maps. The latter provide a greater user-interaction since they have a manifold of features (e.g., popup text labels, mini-maps, option buttons, etc.).
The package has the `eq_map` function that displays an R leaflet map with the quake's epicenters and information of a given variable.
```{r}
# Displays an R's leaflet map with the epicenters of earthquakes thad have
# occurred in Indonesia as of 2000. The interactive map has popup text labels
# with the date of occurrence.
clean_data %>%
dplyr::filter(COUNTRY == "INDONESIA" & lubridate::year(DATE) >= 2000) %>%
eq_map(annot_col = "DATE")
```
Also, the package come along with the `eq_create_label` function that can be used to provide additional information within the popup text labels displayed on the R leaflet map.
```{r}
# Displays an R's leaflet map with the epicenters of the earthquakes that have
# occurred in Indonesia as of 2000. The interactive map has popup text labels
# with the location, magnitude, and total deaths.
clean_data %>%
dplyr::filter(COUNTRY == "INDONESIA" & lubridate::year(DATE) >= 2000) %>%
dplyr::mutate(popup_text = eq_create_label(.)) %>%
eq_map(annot_col = "popup_text")
```