-
Notifications
You must be signed in to change notification settings - Fork 14
/
10-Map-Algebra.Rmd
executable file
·343 lines (244 loc) · 18.4 KB
/
10-Map-Algebra.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
---
output: html_document
editor_options:
chunk_output_type: console
---
# Map Algebra {#chp10_0}
**Dana Tomlin** [@DTomlin] is credited with defining a framework for the analysis of field data stored as gridded values (i.e. rasters). He coined this framework *map algebra*. Though gridded data can be stored in a vector format, map algebra is usually performed on raster data. Map algebra operations and functions are broken down into four groups: **local**, **focal**, **zonal** and **global**. Each is explored in the following
sections.
```{r echo=FALSE}
library(raster)
library(rasterVis)
gp <- function(sg, nr=1, sfmt="%02.0f", fnts = 2.1){
library(ggplot2)
library(scales)
r.df <- as.data.frame(cbind(xyFromCell(sg,1:ncell(sg)),getValues(sg)))
plotData = tidyr::pivot_longer(r.df, cols = c(-x,-y) )
ggplot(aes(x = x, y = y), data = plotData, environment = environment()) +
geom_tile(aes(fill = value)) + facet_wrap(~ name, nrow=nr) +
scale_colour_brewer(palette="Blues") +
theme(strip.text.x = element_text(size=9, family="mono", face="bold"),
legend.position = "none",
axis.text=element_text(size=7),
axis.title=element_text(size=8) ) +
coord_equal() + geom_text(aes(label=sprintf(fmt=sfmt, value)), colour='white',size=fnts)
}
r = raster(ncol=10, nrow=10, xmn=0, xmx=100, ymn=0, ymx=100)
projection(r) = "+proj=utm +zone=19 +datum=NAD83"
values(r) = seq(1 : ncell(r))
```
## Local operations and functions
Local operations and functions are applied to each individual cell and only involve those cells sharing the same location.
For example, if we start off with an original raster, then multiply it by 2 then add 1, we get a new raster whose cell values reflect the series of operations performed on the original raster cells. This is an example of a **unary** operation where just one single raster is involved in the operation.
```{r f10-local01, echo=FALSE, fig.cap = "Example of a local operation where `output=(2 * raster + 1)`.", fig.width=5, fig.height=3}
l1 <- r * 2 + 1 # Apply basic operations to r, assign output to l1
sg = stack(r,l1)
names(sg) = c("input","output")
gp(sg,1)
```
More than one raster can be involved in a local operation. For example, two rasters can be summed (i.e. each overlapping pixels are summed) to generate a new raster.
```{r f10-local02, echo=FALSE, fig.cap = "Example of a local operation where `output=(input1+input2)`. Note how each cell output only involves input raster cells that share the same exact location.", fig.width=7, fig.height=3}
l2 <- r - r + runif(length(r), 0,10) # Apply basic operations to r, assign output to l1
l3 <- r + l2
sg = stack(r,l2,l3)
names(sg) = c("input1"," input2", "output")
gp(sg,1)
```
Local operations also include **reclassification** of values. This is where a range of values from the input raster are assigned a new (common) value. For example, we might want to reclassify the input raster values as follows:
Original values Reclassified values
---------------- --------------------
0-25 25
26-50 50
51-75 75
76-100 100
```{r f10-local03, echo=FALSE, fig.cap = "Example of a local operation where the output results from the reclassification of input values.", message=FALSE, warning=FALSE, fig.height=2.9, fig.width=5}
m = c(0, 25, 25, 25, 50, 50, 50, 75, 75, 75, 100, 100)
m = matrix(m, ncol=3, byrow = T)
l4 = reclassify(r, m, right = T)
l1 <- r * 2 + 1 # Apply basic operations to r, assign output to l1
sg = stack(r,l4)
names(sg) = c("input","output")
gp(sg,1)
```
## Focal operations and functions
> Focal operations are also referred to as *neighborhood* operations.
Focal operations assign to the output cells some summary value (such as the mean) of the neighboring cells from the input raster. For example, a cell output value can be the average of all 9 neighboring input cells (including the center cell); this acts as a *smoothing function*.
```{r f10-focal01, echo=FALSE, fig.cap = "Example of a focal operation where the output cell values take on the average value of neighboring cells from the input raster. Focal cells surrounded by non-existent cells are assigned an `NA` in this example.", message=FALSE, warning=FALSE, fig.height=2.9, fig.width=5}
r2 = raster(ncol=10, nrow=10, xmn=0, xmx=100, ymn=0, ymx=100)
set.seed(123)
values(r2) = round(runif(ncell(r2),1,100))
f1 = focal(r2, w=matrix(1/9,nrow=3,ncol=3), fun=sum)
sg = stack(r2,f1)
names(sg) = c("input","output")
gp(sg,1,"%2.1f")
```
Notice how, in the above example, the edge cells from the output raster have been assigned a value of `NA` (No Data). This is because cells outside of the extent have no value. Some GIS applications will ignore the missing surrounding values and just compute the average of the available cells as demonstrated in the next example.
```{r f10-focal02, echo=FALSE, fig.cap = "Example of a focal operation where the output cell values take on the average value of neighboring cells from the input raster. Surrounding non-existent cells are ignored.", message=FALSE, warning=FALSE, fig.height=3, fig.width=5}
r2 = raster(ncol=10, nrow=10, xmn=0, xmx=100, ymn=0, ymx=100)
set.seed(123)
values(r2) = round(runif(ncell(r2),1,100))
f1 = focal(r2, w=matrix(1,nrow=3,ncol=3) , fun=mean, na.rm=TRUE, pad=TRUE)
sg = stack(r2, f1)
names(sg) = c("input","output")
gp(sg,1,"%2.1f")
```
Focal (or neighbor) operations require that a window region (a **kernel**) be defined. In the above examples, a simple 3 by 3 kernel (or window) was used in the focal operations. The kernel can take on different dimensions and shape such as a 3 by 3 square where the central pixel is ignored (thus reducing the number of neighbors to 8) or a circular neighbor defined by a radius.
```{r f10-focal03, echo=FALSE, fig.cap = "Example of a focal operation where the kernel is defined by a 3 by 3 cell *without* the center cell and whose output cell takes on the average value of those neighboring cells. ", message=FALSE, warning=FALSE, fig.height=3, fig.width=5}
r2 = raster(ncol=10, nrow=10, xmn=0, xmx=100, ymn=0, ymx=100)
set.seed(123)
values(r2) = round(runif(ncell(r2),1,100))
m <- matrix(c(1,1,1,1,0,1,1,1,1),nrow = 3) # This defines the 3x3 moving window
f1 = focal(r2, w=m , fun=mean, na.rm=TRUE, pad=TRUE)
sg = stack(r2,f1)
names(sg) = c("input","output")
gp(sg,1,"%2.1f")
```
In addition to defining the neighborhood shape and dimension, a kernel also defines the weight each neighboring cell contributes to the summary statistic. For example, all cells in a 3 by 3 neighbor could each contribute 1/9^th^ of their value to the summarized value (i.e. equal weight). But the weight can take on a more complex form defined by a function; such weights are defined by a **kernel function**. One popular function is a **Gaussian** weighted function which assigns greater weight to nearby cells than those further away.
```{r f10-focal04, echo=FALSE, fig.cap = "Example of a focal operation where the kernel is defined by a Gaussian function whereby the closest cells are assigned a greater weight.", message=FALSE, warning=FALSE, fig.height=3, fig.width=5}
r2 = raster(ncol=10, nrow=10, xmn=0, xmx=100, ymn=0, ymx=100)
set.seed(123)
values(r2) = round(runif(ncell(r2),1,100))
m = focalWeight(r2, 4, "Gauss")
f1 = focal(r2, w=m , fun=sum, na.rm=TRUE, pad=TRUE)
sg = stack(r2,f1)
names(sg) = c("input","output")
gp(sg,1,"%2.1f")
```
## Zonal operations and functions
A zonal operation computes a new summary value (such as the mean) from cells aggregated for some zonal unit. In the following example, the cell values from the raster layer are aggregated into three zones whose boundaries are delineated in red. Each output zone shows the average value of the cells within that zone.
```{r f10-zonal01, echo=FALSE, fig.cap = "Example of a zonal operation where the cell values are averaged for each of the three zones delineated in red.", message=FALSE, warning=FALSE, fig.height=3.5, fig.width=6}
library(gridExtra)
library(stars)
#library(broom)
cds1 <- rbind(c(0,50), c(0,100), c(50,100), c(50,50), c(0,50))
cds2 <- rbind(c(0,0), c(100,0), c(50,50), c(0,50), c(0,0))
cds3 <- rbind(c(100,0), c(50,50), c(50,100), c(100,100), c(100,0))
polys <- SpatialPolygons(list(Polygons(list(Polygon(cds1)), 1),
Polygons(list(Polygon(cds2)), 2),
Polygons(list(Polygon(cds3)), 3)))
polys <- st_sf(st_sfc(list(st_polygon(list(cds1)), st_polygon(list(cds2)), st_polygon(list(cds3)))), crs = crs(r, asText = TRUE))
v <- extract(r, polys, fun=mean, bind = TRUE)
dfx <- data.frame(rbind(cds1[1:4,],cds2[1:4,], cds3[1:4,]),
id = rep(1:3, each = 4),
val = rep(extract(r, polys, fun=mean, bind = TRUE), each = 4))
dfx_center <- as.data.frame(st_coordinates(st_centroid(polys)))
dfx_center$val <- v
sg <- stack(r)
names(sg) <- "input"
p1 <- gp(sg) + geom_polygon(data=dfx, aes(x=X1, y = X2, group = id), colour = "red", fill = NA)
names(sg) <- "output"
p2 <- gp(sg) + geom_polygon(data=dfx, aes(x=X1, y = X2, group = id, fill = val),
alpha = 0.5, colour = "white") +
geom_text(data=dfx_center,
aes(x=X, y = Y,label=sprintf("%2.1f", val)),colour='white',size=4.1)
grid.arrange(p1, p2, ncol=2)
```
This technique is often used with rasters derived from remotely sensed imagery (e.g. NDVI) where areal units (such as counties or states) are used to compute the average cell values from the raster.
## Global operations and functions
Global operations and functions may make use of some or all input cells when computing an output cell value. An example of a global function is the *Euclidean Distance* tool which computes the shortest distance between a pixel and a source (or destination) location. In the following example, a new raster assigns to each cell a distance value to the closest cell having a value of 1 (there are just two such cells in the input raster).
```{r f10-global01, echo=FALSE, fig.cap = "Example of a global function: the Euclidean distance. Each pixel is assigned its closest distance to one of the two source locations (defined in the input layer).", message=FALSE, warning=FALSE, fig.height=3, fig.width=5}
p = raster(ncol=10, nrow=10, xmn=0, xmx=100, ymn=0, ymx=100) # Create an empty raster
p[] = NA # Assign NoData values to all pixels
p[c(20,85)] = 1 # Change the pixels 20 and 85 to 1
p.d = distance(p) # Compute the shortest distance between each output pixel in the
# output raster and pixels 20 and 85
sg = stack(p, p.d)
names(sg) = c("input","output")
gp(sg,1,"%2.0f")
```
Global operations and functions can also generate single value outputs such as the overall pixel mean or standard deviation.
Another popular use of global functions is in the mapping of least-cost paths where a cost surface raster is used to identify the shortest path between two locations which minimizes cost (in time or money).
## Operators and functions
Operations and functions applied to gridded data can be broken down into three groups: **mathematical**, **logical comparison** and **Boolean**.
### Mathematical operators and functions
Two mathematical operators have already been demonstrated in earlier sections: the *multiplier* and the *addition* operators. Other operators include division and the modulo (aka the modulus) which is the remainder of a division. Mathematical *functions* can also be applied to gridded data manipulation. Examples are square root and sine functions. The following table showcases a few examples with ArcGIS and R syntax.
Operation | ArcGIS Syntax | R Syntax | Example
----------|--------|--------|--------
Addition | `+` | `+` | `input1 + input2`
Subtraction | `-` | `-` | `input1 - input2`
Division | `/` | `/` | `input1 / input2`
Modulo | `Mod()` | `%%` | `Mod(input1, 100)`, `input1 %% 10`
Square root | `SquareRoot()` | `sqrt()`| `SquareRoot(input1)`, `sqrt(input1)`
### Logical comparison
The logical comparison operators evaluate a condition then output a value of `1` if the condition is true and `0` if the condition is false. Logical comparison operators consist of *greater than*, *less than*, *equal* and *not equal*.
Logical comparison | Syntax
-------------------|-------
Greater than | `>`
Less than | `<`
Equal | `==`
Not equal | `!=`
For example, the following figure shows the output of the comparison between two rasters where we are assessing if cells in `input1` are *greater than* those in `input2` (on a cell-by-cell basis).
```{r f10-logical01, echo=FALSE, fig.cap = "Output of the operation `input1` > `input2`. A value of 1 in the output raster indicates that the condition is true and a value of 0 indicates that the condition is false.", fig.height=2, fig.width=5}
ras1 <- raster(ncol=3, nrow=3, xmn=0, xmx=100, ymn=0, ymx=100)
ras2 <- ras1
ras.out <- ras1
projection(r) <- "+proj=utm +zone=19 +datum=NAD83"
set.seed(32)
values(ras1) <- round(runif(length(ras1),-5,5))
set.seed(5)
values(ras2) <- round(runif(length(ras2),-5,5))
ras.out <- (ras1 > ras2)
sg = stack(ras1,ras2,ras.out)
names(sg) <- c("input1","input2", "output")
gp(sg,1, sfmt="%2.0f", fnts=4)
```
When assessing whether two cells are equal, some programming environments such as R and ArcGIS's *Raster Calculator* require the use of the *double equality* syntax, `==`, as in `input1 == input2`. In these programming environments, the single equality syntax is usually interpreted as an *assignment operator* so `input1 = input2` would instruct the computer to assign the cell values in `input2` to `input1` (which is not what we want to do here).
Some applications make use of special functions to test a condition. For example, ArcGIS has a function called `Con(condition, out1, out2)` which assigns the value `out1` if the condition is met and a value of `out2` if it's not. For example, ArcGIS's raster calculator expression
```
Con( input1 > input2, 1, 0)
```
outputs a value of `1` if `input1` is greater than `input2` and `0` if not. It generates the same output as the one shown in the above figure. Note that in most programming environments (including ArcGIS), the expression `input1 > input2` produces the same output because the value `1` is the numeric representation of `TRUE` and `0` that of `FALSE`.
### Boolean (or Logical) operators
In map algebra, *Boolean* operators are used to compare conditional states of a cell (i.e. TRUE or FALSE). The three Boolean operators are **AND**, **OR** and **NOT**.
Boolean ArcGIS R Example
-------- -------- --------- ------
AND & & `input1 & input2`
OR `|` `|` `input1 | input2`
NOT `~` `!` `~input2`, `! input2`
> A "TRUE" state is usually encoded as a `1` or any *non-zero* integer while a "FALSE" state is usually encoded as a `0`.
For example, if `cell1=0` and `cell2=1`, the Boolean operation `cell1 AND cell2` results in a `FALSE` (or 0) output cell value. This Boolean operation can be translated into plain English as "are the cells 1 and 2 both TRUE?" to which we answer "No they are not" (cell1 is FALSE). The **OR** operator can be interpreted as "is x *or* y TRUE?" so that `cell1 OR cell2` would return `TRUE`. The **NOT** interpreter can be interpreted as "is x not TRUE?" so that `NOT cell1` would return `TRUE`.
```{r f10-boolean01, echo=FALSE, fig.cap = "Output of the operation `input1 AND input2`. A value of 1 in the output raster indicates that the condition is true and a value of 0 indicates that the condition is false. Note that many programming environments treat *any* none 0 values as TRUE so that `-3 AND -4` will return `TRUE`.", fig.height=2, fig.width=5}
ras1 <- raster(ncol=3, nrow=3, xmn=0, xmx=100, ymn=0, ymx=100)
ras2 <- ras1
ras.out <- ras1
projection(r) <- "+proj=utm +zone=19 +datum=NAD83"
set.seed(32)
values(ras1) <- round(runif(length(ras1),-5,5))
set.seed(5)
values(ras2) <- round(runif(length(ras2),-5,5))
ras.out <- (ras1 & ras2)
sg = stack(ras1,ras2,ras.out)
names(sg) <- c("input1","input2", "output")
gp(sg,1, sfmt="%2.0f", fnts=4)
```
```{r f10-boolean02, echo=FALSE, fig.cap = "Output of the operation `NOT input2`. A value of 1 in the output raster indicates that the input cell is NOT TRUE (i.e. has a value of 0).", fig.height=2, fig.width=5}
ras1 <- raster(ncol=3, nrow=3, xmn=0, xmx=100, ymn=0, ymx=100)
ras2 <- ras1
ras.out <- ras1
projection(r) <- "+proj=utm +zone=19 +datum=NAD83"
set.seed(32)
values(ras1) <- round(runif(length(ras1),-5,5))
set.seed(5)
values(ras2) <- round(runif(length(ras2),-5,5))
ras.out <- (! ras2)
sg = stack(ras2,ras.out)
names(sg) <- c("input2", "output")
gp(sg,1, sfmt="%2.0f", fnts=4)
```
### Combining operations
Both comparison and Boolean operations can be combined into a single expression. For example, we may wish to find locations (cells) that satisfy requirements from two different raster layers: e.g. `0<input1<4` AND `input2>0`. To satisfy the first requirement, we can write out the expression as `(input1>0) & (input1<4)`. Both comparisons (delimited by parentheses) return a `0` (FALSE) or a `1` (TRUE). The ampersand, `&`, is a Boolean operator that checks that both conditions are met and returns a 1 if yes or a 0 if not. This expression is then combined with another comparison using another ampersand operator that assesses the criterion input2>0. The amalgamated expression is thus `((input1>0) & (input1<4)) & (input2>0)`.
```{r f10-combining01, echo=FALSE, fig.height=2, fig.width=5, fig.cap = "Output of the operation ((`input1`>`0`) & (`input1`<`4`)) & (`input2`>`0`). A value of 1 in the output raster indicates that the condition is true and a value of 0 indicates that the condition is false."}
ras1 <- raster(ncol=3, nrow=3, xmn=0, xmx=100, ymn=0, ymx=100)
ras2 <- ras1
ras.out <- ras1
projection(r) <- "+proj=utm +zone=19 +datum=NAD83"
set.seed(32)
values(ras1) <- round(runif(length(ras1),-5,5))
set.seed(5)
values(ras2) <- round(runif(length(ras2),-5,5))
ras.out <- ((ras1 > 0) & (ras1 <4)) & (ras2 > 0)
sg <- stack(ras1,ras2,ras.out)
names(sg) <- c("input1","input2", "output")
gp(sg,1, sfmt="%2.0f", fnts=4)
```
> Note that most software environments assign the ampersand character, `&`, to the `AND` Boolean operator.