-
Notifications
You must be signed in to change notification settings - Fork 14
/
A01-Data-Manipulation.Rmd
executable file
·372 lines (244 loc) · 16.2 KB
/
A01-Data-Manipulation.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
# References
<div id="refs"></div>
# (APPENDIX) Appendix {-}
# Reading and writing spatial data in R
```{r set-options, echo=FALSE, cache=FALSE}
options(width = 100)
```
```{r, echo = FALSE}
source("package_list.R")
get.pckg.info("A01-Data-Manipulation.Rmd")
```
## Sample files for this exercise{-#app1_2}
First, you will need to download some sample files from the github repository. Make sure to set your R session folder to the directory where you will want to save the sample files before running the following code chunks.
```{r results='hide'}
download.file("https://github.com/mgimond/Spatial/raw/main/Data/Income_schooling.zip",
destfile = "Income_schooling.zip" , mode='wb')
unzip("Income_schooling.zip", exdir = ".")
file.remove("Income_schooling.zip")
```
```{r}
download.file("https://github.com/mgimond/Spatial/raw/main/Data/rail_inters.gpkg",
destfile = "./rail_inters.gpkg", mode='wb')
```
```{r}
download.file("https://github.com/mgimond/Spatial/raw/main/Data/elev.img",
destfile = "./elev.img", mode='wb')
```
## Introduction {-#app1_3}
There are several different R spatial formats to choose from. Your choice of format will largely be dictated by the package(s) and or function(s) used in your workflow. A breakdown of formats and intended use are listed below.
```{r echo=FALSE}
my_tbl <- tibble::tribble(
~`Data format`, ~`Used with...`, ~`Used in package...`, ~`Used for...`, ~Comment,
"\`sf\`", "vector", "\`sf\`, others", "visualizing, manipulating, querying", "This is the new spatial standard in R. Will also read from spatially enabled databases such as postgresSQL.",
"\`raster\`", "raster", "\`raster\`, others", "visualizing, manipulating, spatial statistics", "This has been the most popular raster format fo rmany years. But, it is gradually being supplanted by `terra`",
"\`SpatRaster\`", "terra", "\`terra\`, others", "visualizing, manipulating, spatial statistics", "This is gradually replacing `raster`",
"\`SpatialPoints*\` \n \`SpatialPolygons*\` \n \`SpatialLines*\` \n \`SpatialGrid*\`\n", "vector and raster", "\`sp\`, \`spdep\`", "Visualizing, spatial statistics", "These are legacy formats. \`spdep\` now accepts `sf` objects",
"\`ppp\` \`owin\`", "vector", "\`spatstat\`", "Point pattern analysis/statistics", NA,
"\`im\`", "raster", "\`spatstat\`", "Point pattern analysis/statistics", NA
)
`%>%` <- magrittr::`%>%`
kableExtra::kable_styling(
kableExtra::kable(my_tbl, digits = 3, row.names = FALSE, align = c("r", "c","c","l","l") ,
caption = NULL, format = "html"),
bootstrap_options = c("striped", "hover", "condensed"),
position = "left", full_width = FALSE) %>%
kableExtra::row_spec(0, color="white", background = "#6E6E6E", align="c") %>%
kableExtra::footnote(number=c("The `spatial*` format includes SpatialPointsDataFrame, SpatialPolygonsDataFrame, SpatialLinesDataFrame, etc...") )
```
There is an attempt at standardizing the spatial format in the R ecosystem by adopting a well established set of spatial standards known as [simple features](https://en.wikipedia.org/wiki/Simple_Features). This effort results in a recently developed package called [`sf`](https://r-spatial.github.io/sf/) [@sf]. It is therefore recommended that you work in an `sf` framework when possible. As of this writing, most of the _basic_ data manipulation and visualization operations can be successfully conducted using `sf` spatial objects.
Some packages such as `spdep` and `spatstat` require specialized data object types. This tutorial will highlight some useful conversion functions for this purpose.
## Creating spatial objects {-#app1_4}
The following sections demonstrate different spatial data object creation strategies.
### Reading a shapefile {-#app1_4_1}
Shapefiles consist of many files sharing the same core filename and different suffixes (i.e. file extensions). For example, the sample shapefile used in this exercise consists of the following files:
```{r echo= FALSE}
list.files(".", "Income_schooling.*")
```
Note that the number of files associated with a shapefile can vary. `sf` only needs to be given the `*.shp` name. It will then know which other files to read into R such as projection information and attribute table.
```{r results='hide'}
library(sf)
s.sf <- st_read("Income_schooling.shp")
```
Let's view the first few records in the spatial data object.
```{r}
head(s.sf, n=4) # List spatial object and the first 4 attribute records
```
Note that the `sf` object stores not only the geometry but the coordinate system information and attribute data as well. These will be explored later in this exercise.
### Reading a GeoPackage {-#app1_4_2}
A geopackage can store more than one layer. To list the layers available in the geopackage, type:
```{r}
st_layers("rail_inters.gpkg")
```
In this example, we have two separate layers: `Interstate` and `Rail`. We can extract each layer separately via the `layer=` parameter.
```{r results='hide'}
inter.sf <- st_read("rail_inters.gpkg", layer="Interstate")
rail.sf <- st_read("rail_inters.gpkg", layer="Rail")
```
### Reading a raster {-#app1_4_3}
> In earlier versions of this tutorial, the `raster` package was used to read raster files. This is being supplanted by `terra` which will be the package used in this and in subsequent exercises.
`terra` will read many different raster file formats such as geoTiff, Imagine and HDF5 just to name a few. To see a list of supported raster file formats on your computer simply run:
```{r eval = FALSE}
terra::gdal(drivers = TRUE) |> subset(type == "raster")
```
In the following example, an _Imagine_ raster file is read into R using the `rast` function.
```{r}
library(terra)
elev.r <- rast("elev.img")
```
The object class is of type `SpatRaster`.
```{r}
class(elev.r)
```
What sets a `SpatRaster` object apart from other R data file objects is its storage. By default, data files are loaded into memory, but `SpatRaster` objects are not. This can be convenient when working with raster files too large for memory. But this comes at a performance cost. If your RAM is large enough to handle your raster file, it's best to load the entire dataset into memory.
To check if the `elev.r` object is loaded into memory, run:
```{r}
inMemory(elev.r)
```
An output of `FALSE` indicates that it is not. To force the raster into memory use `set.values`:
```{r}
set.values(elev.r)
```
Let's check that the raster is indeed loaded into memory:
```{r}
inMemory(elev.r)
```
Now let's look at the raster's properties:
```{r}
elev.r
```
The raster object returns its grid dimensions (number of rows and columns), pixel size/resolution (in the layer's coordinate system units), geographic extent, native coordinate system (UTM NAD83 Zone 19 with units of meters) and min/max raster values.
### Creating a spatial object from a data frame {-#app1_4_5}
Geographic point data locations recorded in a spreadsheet can be converted to a spatial point object. Note that it's important that you specify the coordinate system used to record the coordinate pairs since such information is not stored in a data frame. In the following example, the coordinate values are recorded in a WGS 1984 geographic coordinate system (`crs = 4326`).
```{r}
# Create a simple dataframe with lat/long values
df <- data.frame(lon = c(-68.783, -69.6458, -69.7653),
lat = c(44.8109, 44.5521, 44.3235),
Name= c("Bangor", "Waterville", "Augusta"))
# Convert the dataframe to a spatial object. Note that the
# crs= 4326 parameter assigns a WGS84 coordinate system to the
# spatial object
p.sf <- st_as_sf(df, coords = c("lon", "lat"), crs = 4326)
p.sf
```
### Geocoding street addresses {-#app1_4_6}
The `tidygeocoder` package will convert street addresses to latitude/longitude coordinate pairs using a wide range of geocoding services such as the US census and Google. Some of these geocoding services will require an API key, others will not. Click [here](https://jessecambon.github.io/tidygeocoder/articles/geocoder_services.html) to see the list of geocoding services supported by `tidygeocoder` and their geocoding limitations. In the example that follows, the `osm` geocoding service is used by default.
```{r}
library(tidygeocoder)
options(pillar.sigfig = 7) # Increase significant digits in displayed output
dat <- data.frame(
name = c("Colby College", "Bates College", "Bowdoin College"),
address = c("4000 Mayflower drive, Waterville, ME , 04901",
"275 College st, Lewiston, ME 04240",
"255 Maine St, Brunswick, ME 04011"))
geocode(.tbl = dat, address = address, method = "osm")
```
Another free (but manual) alternative, is to use the US Census Bureau's web [geocoding service](https://geocoding.geo.census.gov/geocoder/locations/addressbatch?form) for creating lat/lon values from a file of US street addresses. This needs to be completed via their web interface and the resulting data table (a CSV file) would then need to be loaded into R as a data frame.
## Converting from an `sf` object {-#app1_5}
Packages such as `spdep` (older versions only) and `spatsat` do not support `sf` objects. The following sections demonstrate methods to convert from `sf` to other formats.
### Converting an `sf` object to a `Spatial*` object (`spdep`/`sp`) {-#app1_5_1}
The following code will convert point, polyline or polygon features to a `spatial*` object. While the current version of `spdep` will now accept `sf` objects, converting to `spatial*` objects will be necessary with legacy `spdep` packages. In this example, an `sf` polygon feature is converted to a `SpatialPolygonsDataFrame` object.
```{r}
s.sp <- as_Spatial(s.sf)
class(s.sp)
```
### Converting an `sf` polygon object to an `owin` object {-#app1_5_2}
The `spatstat` package is used to analyze point patterns however, in most cases, the study extent needs to be explicitly defined by a polygon object. The polygon should be of class `owin`.
```{r}
library(spatstat)
s.owin <- as.owin(s.sf)
class(s.owin)
```
Note the loading of the package `spatstat`. This is required to access the `as.owin.sf` method for `sf`.
Note too that the attribute table gets stripped from the polygon data. This is usually fine given that the only reason for converting a polygon to an `owin` format is for delineating the study boundary.
### Converting an `sf` point object to a `ppp` object {-#app1_5_3}
The `spatstat` package is currently designed to work with projected (planar) coordinate system. If you attempt to convert a point object that is in a geographic coordinate system, you will get the following error message:
```{r error=TRUE}
p.ppp <- as.ppp(p.sf)
```
The error message reminds us that a geographic coordinate system (i.e. one that uses angular measurements such as latitude/longitude) cannot be used with this package. If you encounter this error, you will need to project the point object to a projected coordinate system.
In this example, we'll project the `p.sf` object to a UTM coordinate system (`epsg=32619`). Coordinate systems in R are treated in a later appendix.
```{r}
p.sf.utm <- st_transform(p.sf, 32619) # project from geographic to UTM
p.ppp <- as.ppp(p.sf.utm) # Create ppp object
class(p.ppp)
```
Note that if the point layer has an attribute table, its attributes will be converted to `ppp` _marks_. These attribute values can be accessed via `marks(p.ppp)`.
### Converting a `SpatRaster` object to an `im` object {-#app1_5_4}
To create a `spatstat` `im` raster object from a `SpatRaster` object, you will need to first create a three column dataframe from the `SpatRaster` objects with the first two columns defining the X and Y coordinate values of each cell, and the third column defining the cell values
```{r}
df <- as.data.frame(elev.r,xy=TRUE)
elev.im <- as.im(df)
class(elev.im)
```
## Converting to an `sf` object {-#app1_6}
All aforementioned spatial formats, except `owin`, can be coerced to an `sf` object via the `st_as_sf` function. for example:
```{r results='hide'}
st_as_sf(p.ppp) # For converting a ppp object to an sf object
st_as_sf(s.sp) # For converting a Spatial* object to an sf object
```
## Dissecting the `sf` file object {-#app1_7}
```{r}
head(s.sf,3)
```
The first line of output gives us the geometry type, `MULTIPOLYGON`, a multi-polygon data type. This is also referred to as a multipart polygon. A single-part `sf` polygon object will adopt the `POLYGON` geometry.
The next few lines of output give us the layer's bounding extent in the layer's native coordinate system units. You can extract the extent via the `st_bbox()` function as in `st_bbox(s.sf)`.
The following code chunk can be used to extract addition coordinate information from the data.
```{r eval = FALSE}
st_crs(s.sf)
```
Depending on the version of the `PROJ` library used by `sf`, you can get two different outputs. If your version of `sf` is built with a version of `PROJ` older than `6.0`, the output will consist of an **epsg** code (when available) and a **proj4** string as follows:
```{r eval = FALSE}
Coordinate Reference System:
EPSG: 26919
proj4string: "+proj=utm +zone=19 +datum=NAD83 +units=m +no_defs"
```
If your version of `sf` is built with a version of `PROJ` `6.0` or greater, the output will consist of a user defined CS definition (e.g. an **epsg** code), if available, and a *Well Known Text* (WKT) formatted coordinate definition that consists of a series of `[ ]` tags as follows:
```{r echo = FALSE}
st_crs(s.sf)
```
The WKT format will usually start with a `PROJCRS[...]` tag for a projected coordinate system, or a `GEOGCRS[...]` tag for a geographic coordinate system. More information on coordinate systems in R can be found in [the coordinate systems appendix](https://mgimond.github.io/Spatial/coordinate-systems-in-r.html).
What remains of the `sf` summary output is the first few records of the attribute table. You can extract the object's table to a dedicated data frame via:
```{r}
s.df <- data.frame(s.sf)
class(s.df)
head(s.df, 5)
```
The above chunk will also create a geometry column. This column is somewhat unique in that it stores its contents as a **list** of geometry coordinate pairs (polygon vertex coordinate values in this example).
```{r}
str(s.df)
```
You can also opt to remove this column prior to creating the dataframe as follows:
```{r}
s.nogeom.df <- st_set_geometry(s.sf, NULL)
class(s.nogeom.df)
head(s.nogeom.df, 5)
```
## Exporting to different data file formats {-#app1_8}
You can export an `sf` object to many different spatial file formats such as a shapefile or a geopackage.
```{r eval=FALSE}
st_write(s.sf, "shapefile_out.shp", driver = "ESRI Shapefile") # create to a shapefile
st_write(s.sf, "s.gpkg", driver = "GPKG") # Create a geopackage file
```
If the file you are writing to already exists, the above will throw an error. To force an overwrite, simply add the `delete_layer = TRUE` argument to the `st_write` function.
You can see a list of writable vector formats via:
```{r eval = FALSE}
gdal(drivers = TRUE) |> subset(can %in% c("write", "read/write" ) & type == "vector")
```
The value in the `name` column is the driver name to pass to the `driver =` argument in the `st_write()` function.
To export a raster to a data file, use `writeRaster()` function.
```{r eval=FALSE}
writeRaster(elev.r, "elev_out.tif", gdal = "GTiff" ) # Create a geoTiff file
writeRaster(elev.r, "elev_out.img", gdal = "HFA" ) # Create an Imagine raster file
```
You can see a list of writable raster formats via:
```{r eval = FALSE}
gdal(drivers = TRUE) |> subset(can %in% c("write", "read/write" ) & type == "raster")
```
The value in the `name` column is the driver name to pass to the `gdal =` argument in the `writeRaster()` function.
```{r remove_temp_files, echo = FALSE, results='hide', message=FALSE}
files.rm <- dir(".", "Income_schooling.*")
file.remove(file.path(".", files.rm ))
file.remove(file.path(".", "elev.img" ))
file.remove(file.path(".", "rail_inters.gpkg" ))
```