-
Notifications
You must be signed in to change notification settings - Fork 31
/
Copy pathdata-import.qmd
104 lines (73 loc) · 2.17 KB
/
data-import.qmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
---
title: "Data import"
---
```{r}
#| results: "asis"
#| echo: false
source("_common.R")
```
## Prerequisites {.unnumbered}
```{r}
library(tidyverse)
```
## 8.2.4 Exercises {.unnumbered}
1. For reading a file delimited with `|`, use `read_delim()` with argument `delim = "|"`.
2. All other arguments are common among the two functions.
3. `col_positions` is an important argument since it defines the beginning and end of columns.
4. We need to specify the `quote` argument.
```{r}
read_csv("x,y\n1,'a,b'", quote = "\'")
```
5. Problems with each `read_csv()` statement is shown below:\\
- There are only two column headers but three values in each row, so the last two get merged:
```{r}
read_csv("a,b\n1,2,3\n4,5,6")
```
- There are only three column headers, first row is missing a value in the last column so gets an `NA` there, the second row has four values so the last two get merged:
```{r}
read_csv("a,b,c\n1,2\n1,2,3,4")
```
- No rows are read in:
```{r}
read_csv("a,b\n\"1")
```
- Each column has a numerical and a character value, so the column type is coerced to character:
```{r}
read_csv("a,b\n1,2\na,b")
```
- The delimiter is `;` but it's not specified, therefore this is read in as a single-column data frame with a single observation:
```{r}
read_csv("a;b\n1;3")
```
6. The non-syntactic names can be read in as follows.
```{r}
annoying <- tibble(
`1` = 1:10,
`2` = `1` * 2 + rnorm(length(`1`))
)
```
a. Extracting the variable called `1`:
```{r}
annoying |>
select(`1`)
```
b. Plotting a scatterplot of `1` vs. `2`:
```{r}
ggplot(annoying, aes(x = `2`, y = `1`)) +
geom_point()
```
c. Creating a new column called `3`, which is `2` divided by `1`:
```{r}
annoying |>
mutate(`3` = `2` / `1`)
```
d. Renaming the columns to `one`, `two`, and `three`:
```{r}
annoying |>
mutate(`3` = `2` / `1`) |>
rename(
"one" = `1`,
"two" = `2`,
"three" = `3`
)
```