-
Notifications
You must be signed in to change notification settings - Fork 24
/
Copy pathboolean.qmd
330 lines (256 loc) · 11.2 KB
/
boolean.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
---
editor:
markdown:
wrap: 72
---
```{=html}
<style>
.boolean_table table {
max-width: 90%;
}
</style>
```
# Relational and boolean operations
```{r echo=FALSE}
source("libs/Common.R")
options(width = 80)
```
```{r echo = FALSE}
R_ver(c)
```
You've already been exposed to a few examples of relational and boolean
operations in earlier chapters. A formal exploration of these techniques
follow.
## Relational operations
Relational operations play an important role in data manipulation.
Anytime you subset a dataset based on one or more criterion, you are
making use of a relational operation. The relational operators (also
known as *logical binary operators*) include `==`, `!=`, `<`, `<=`, `>`
and `>=`. The output of a condition is a logical vector `TRUE` or
`FALSE`.\
<br>
+-----------------+----------------+-------------------------+
| Relational | Syntax | Example |
| operator | | |
+=================+================+=========================+
| Exact equality | `==` | 3 == 4 -\> FALSE |
+-----------------+----------------+-------------------------+
| Exact | `!=` | 3 != 4 -\> TRUE |
| inequality | | |
+-----------------+----------------+-------------------------+
| Less than | `<` | 3 \< 4 -\> TRUE |
+-----------------+----------------+-------------------------+
| Less than or | `<=` | 4 \<= 4 -\> TRUE |
| equal | | |
+-----------------+----------------+-------------------------+
| Greater than | `>` | 3 \> 4 -\> FALSE |
+-----------------+----------------+-------------------------+
| Greater than or | `>=` | 4 \>= 4 -\> TRUE |
| equal | | |
+-----------------+----------------+-------------------------+
## Boolean operations
Boolean operations can be used to piece together multiple evaluations.
R has three boolean operators: The **AND** operator, `&`; The **NOT**
operator, `!`; And the **OR** operator, `|`.
The `&` operator requires that the conditions on both sides of the
boolean operator be satisfied. You would normally use this operator when
addressing a condition along the lines of *"`x` must be satisfied AND
`y` must be satisfied"*.
The `|` operator requires that at least one condition be met on either
side of the boolean operator. You would normally use this operator when
addressing a condition along the lines of "`x` must be satisfied OR `y`
must be satisfied". Note that the output will also be TRUE if *both*
conditions are met.
The `!` operator is a *negation* operator. It will reverse the outcome
of a condition. So if the outcome of an expression is `TRUE`, preceding that
expression with `!` will reverse the outcome to `FALSE` and vice-versa.\
<br>
::: boolean_table
+-----------+-------+---------------------+-----------+
| Boolean | Syntax| Example | Outcome |
| operator | | | |
+===========+=======+=====================+===========+
| AND | `&` | 4 == 3 `&` 1 == 1 | FALSE |
| | | <br> | <br> |
| | | | |
| | | 4 == 4 `&` 1 == 1 | TRUE |
+-----------+-------+---------------------+-----------+
| OR | `|` | 4 == 4 `|` 1 == 1 | TRUE <br> |
| | | <br> | |
| | | | TRUE <br> |
| | | 4 == 3 `|` 1 == 1 | |
| | | <br> | FALSE |
| | | | |
| | | 4 == 3 `|` 1 == 2 | |
+-----------+-------+---------------------+-----------+
| NOT | `!` | `!`(4 == 3) <br> | TRUE <br> |
| | | | |
| | | `!`(4 == 4) | FALSE |
+-----------+-------+---------------------+-----------+
:::
The following table breaks down all possible Boolean outcomes where `T`
= `TRUE` and `F` = `FALSE`:
| Boolean operation | Outcome |
|-------------------|---------|
| T `&` T | TRUE |
| T `&` F | FALSE |
| F `&` F | FALSE |
| T `|` T | TRUE |
| T `|` F | TRUE |
| F `|` F | FALSE |
| `!`T | FALSE |
| `!`F | TRUE |
If the input values to a boolean operation are numeric vectors and not
logical vectors, the numeric values will be interpreted as `FALSE` if
zero and `TRUE` otherwise. For example:
```{r}
1 & 2
1 & 0
```
### Pecking order in operations
Note that the operation `a == (3 | 4)` is **not** the same as `(a == 3) | (a == 4)`. If, for example, `a = 3`, the former will return `FALSE` whereas the latter will return `TRUE`.
```{r}
a <- 3
a == (3 | 4)
(a == 3) | (a == 4)
```
This is because R applies a pecking order to its operations. In the former case, R is first evaluating what is in between the parentheses, `(3 | 4)`.
```{r}
(3 | 4)
```
This returns `TRUE` since the numbers on either side of `|` are converted to `TRUE` (only values of `0` are converted to `FALSE`). It then compares `a` to this logical vector `TRUE`.
```{r}
a == TRUE
```
Here, the `==` operator requires that both sides of the operation be of the same data type. `a` is numeric and `TRUE` is logical. Recall from Chapter 3 that R circumvents differences in data types by coercing all values to the **highest common mode** (see the chapter on [data types](data_objects.html#atomic-vectors)). Here, `numeric` overrides `logical` type thus coercing the `TRUE` variable to its `numeric` data
type representation, `1`. Hence, the evaluation being performed is:
```{r}
a == 1
```
When a vector is evaluated for more than one condition, you need to explicitly break down each condition before combining them with boolean operators.
```{r}
(a == 3) | (a == 4)
```
The above is an example of R's built-in operation precedence rules. For example, *comparison* operations such as `<=` and `>` are performed before boolean operations such that `a == 3 | 4` will first evaluate `a == 3` before evaluating `... | 4`.
Even boolean operations follow a pecking order such that `!` precedes
`&` which precedes `|`. For example:
```{r eval = FALSE}
! TRUE & FALSE | TRUE
```
will first evaluate `! TRUE`, then `... & FALSE`, then `... | TRUE`.
To overrride R's built-in precedence, use parentheses. For example:
```{r eval = FALSE}
! TRUE & (FALSE | TRUE)`
```
will first evaluate `(FALSE | TRUE)` and `! TRUE` separately, then their output will be combined with `... & ...`.
For a full list of operation precedence, access the help page for
`Syntax`.
```{r}
?Syntax
```
The following lists the pecking order from high to low precedence (i.e. top operation is performed before bottom operation).
| | |
|---------------------|-----------------------------------------|
| :: ::: | access variables in a namespace |
| \$ \@ | component / slot extraction |
| \[ \[\[ | indexing |
| \^ | exponentiation (right to left) |
| \- + | unary minus and plus |
| : | sequence operator |
| %any% \|\> | specialoperators (including %% and %/%) |
| \* / | multiply, divide |
| \+ - | (binary) add, subtract |
| \< \> \<= \>= == != | ordering and comparison |
| ! | negation |
| & && | and |
| \| \|\| | or |
| \~ | as in formulae |
| -\> -\>\> | rightwards assignment |
| \<- \<\<- | assignment (right to left) |
| = | assignment (right to left) |
| ? | help |
## Comparing multidimensional objects
The relational operators are used to compare single elements (i.e. one
element at a time). If you want to compare two objects as a whole (e.g.
multi-element vectors or data frames), use the `identical()` function.
For example:
```{r}
a <- c(1, 5, 6, 10)
b <- c(1, 5, 6)
identical(a, a)
identical(a, b)
identical(mtcars, mtcars)
```
Notice that `identical` returns a single logical vector, regardless the
input object's dimensions.
Note that the data structure must match as well as its element values.
For example, if `d` is a list and `a` is an atomic vector, the output of
`identical` will be false even if the internal values match.
```{r}
d <- list( c(1, 5, 6, 10) )
identical(a, d)
```
If we convert `d` from a list to an atomic vector using the `unlist`
function (thus matching data structures), we get:
```{r}
identical(a, unlist(d))
```
## The match operator `%in%`
The match operator `%in%` compares two sets of vectors and assesses if
an element on the left-hand side of `%in%` matches any of the elements
on the right-hand side of the operator. For each element in the
left-hand vector, R returns `TRUE` if the value is present in any of the
right-hand side elements or `FALSE` if not.
For example, given the following vectors:
```{r}
v1 <- c( "a", "b", "cd", "fe")
v2 <- c( "b", "e")
```
find the elements in `v1` that match any of the values in `v2`.
```{r}
v1 %in% v2
```
The function checks whether each element in `v1` has a matching value in
`v2`. For example, element `"a"` in `v1` is compared to elements `"b"`
and `"e"` in `v2`. No matches are found and a `FALSE` is returned. The
next element in `v1`, `"b"`, is compared to both elements in `v2`. This
time, there is a match (`v2` has an element `"b"`) and `TRUE` is
returned. This process is repeated for all elements in `v1`.
The logical vector output has the same length as the input vector `v1`
(four in this example).
If we swap the vector objects, we get a two element logical vector since
we are now comparing each element in `v2` to any matching elements in
`v1`.
```{r}
v2 %in% v1
```
<center>
<video width="620" controls style="float:center;">
<source src="https://github.com/mgimond/ES218/blob/gh-pages/Videos/Matching_operator.mp4?raw=true" type="video/mp4">
</video>
</center>
## Checking if a value is `NA`
When assessing if a value is equal to `NA` the following evaluation may
behave unexpectedly.
```{r}
a <- c (3, 67, 4, NA, 10)
a == NA
```
The output is not a logical data type we would expect from an
evaluation. Instead, you must make use of the `is.na()` function:
```{r}
is.na(a)
```
As another example, if we want to keep all rows in dataframe `d` where
`z` = `NA`, we would type:
```{r}
d <- data.frame(x = c(1,4,2,5,2,3,NA),
y = c(3,2,5,3,8,1,1),
z = c(NA,NA,4,9,7,8,3))
d[ is.na(d$z), ]
```
You can, of course, use the `!` operator to reverse the evaluation and
*omit* all rows where `z` = `NA`,
```{r}
d[ !is.na(d$z), ]
```