-
Notifications
You must be signed in to change notification settings - Fork 1
/
tidyExt_vignette.Rmd
271 lines (173 loc) · 6.41 KB
/
tidyExt_vignette.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
---
title: "tidyExt package"
output: rmarkdown::github_document
---
```{r setup, include=FALSE}
knitr::opts_knit$set(root.dir = '~/Git/tidyExt/', echo=TRUE)
#setwd('~/Git/tidyExt/')
```
# Handy wrappers and extensions for the tidyverse
<br>
<br>
The tidyverse is a user-friendly suite of R packages designed to make data analysis simpler, less error-prone and more enjoyable. This package contains a host of helper functions designed to minimize keystrokes, and to overcome some common pain-points in plotting, data summarization and environment management.
<br>
# Installation
To install the development version, if required first install devtools
```{r, eval=F}
install.packages('devtools')
```
To install tidyExt:
```{r, eval=F, message=F}
devtools::install_github('bansell/tidyExt')
```
Load tidyverse and tidyExt:
```{r,message=F}
library(tidyverse)
library(tidyExt)
```
```{r,echo=F,include=F}
devtools::load_all()
```
<br>
# Source code & environment management
<br>
## printScriptDir()
To manage source code files across multiple systems and folders, it can be useful to quickly print the entire path for the source file you are working in, without quotes for quick copy/paste.
```{r}
printScriptDir()
```
NB run from your .R or .Rmd file, this function will not return the characters "## [1] " in the console.
<br>
## fix_tidyverse_conflicts()
Certain tidyverse functions like rename() and select() often conflict with function names from other packages. If many packages are loaded, to reset the tidyverse functions as the default, after use `fix_tidyverse_conflicts()`. Thanks to Jacob Munro for this one.
```{r}
fix_tidyverse_conflicts()
```
<br>
# Plotting
<br>
## geom_boxjitter()
Make boxplots with overlaid datapoints. There is no jitter in the y axis in order to accurately represent data values.
```{r}
mpg |> ggplot2::ggplot(aes(x=class, y=cty)) +
geom_boxjitter( point_size = 2, point_col='dodger blue')
```
<br>
## geom_boxdodge()
Make nested boxplots with overlaid datapoints. There is no jitter in the y axis in order to accurately represent data values.
```{r}
mpg |> ggplot2::ggplot(aes(x=class, y=cty, col=interaction(drv,cyl))) +
geom_boxdodge()
```
<br>
# statistricks!
...sorry. Here are some useful statistics shortcuts:
<br>
## smooth_lm()
Adds a linear regression line to scatter plot and calls ggpubr to print the line equation and p value
```{r, warning=F}
mpg |> ggplot2::ggplot(aes(cty,hwy)) + geom_point() + geom_smooth_lm()
```
<br>
## scale_this()
A wrapper for scale() that returns a single vector to use within `dplyr::mutate()` etc.
This function is copied from [here](https://stackoverflow.com/a/35776313).
scale() output:
```{r}
diamonds |> mutate(table_scale = scale(table)) |> select(table_scale) |> str()
```
scale_this() output:
```{r}
diamonds |> mutate(table_scale = scale_this(table)) |> select(table_scale) |> str()
```
<br>
## sort_pct()
A way to simultaneously count and sort the relative proportion of character data in descending order.
Simple example:
```{r}
diamonds |> sort_pct(cut)
```
More complex:
```{r}
diamonds |> sort_pct(cut,color)
```
<br>
# ggplot shortcuts
Minimize keystrokes for common plot label and legend modifications
<br>
## bottom_legend
```{r}
sample_n(diamonds,1000) |> ggplot2::ggplot(aes(x=carat,y=price, col=clarity)) + geom_point() + bottom_legend()
```
<br>
## no_legend
```{r}
sample_n(diamonds,1000) |> ggplot2::ggplot(aes(x=carat,y=price, col=clarity)) + geom_point() + no_legend()
```
<br>
## x_angle()
Set x labels at any angle. 30° by default.
```{r}
mpg |> ggplot2::ggplot(aes(x=manufacturer,y=hwy)) + geom_boxjitter() + x_angle()
```
<br>
## plots()
This function is useful when you want to make scatterplots (for example, PCA plots) coloured by multiple different factors. The colour space is rapidly exhausted and important plotting information is lost. For example:
```{r, warning=F,fig.height=8,fig.width=8}
my_df <- mpg |> mutate(year=factor(year), cyl=factor(cyl))
my_df |> gather(key,value,year,cyl,drv,manufacturer) |>
ggplot2::ggplot(aes(cty,hwy,col=value)) + geom_point() + facet_wrap(~key,ncol=2) +
bottom_legend()
```
It is very hard to distinguish the data from years 1999 vs 2008, and the figure legend is a jumble of labels from all facets.
To handle this, we recycling the default ggcolour scale to maximize the contrast in each facet. Caution: be sure to check the legend under each plot to avoid confusing the colour encodings between facets.
First create a vector containing the column names of interest for colouring points in the scatterplot
```{r}
my_features <- c('year','drv','cyl','manufacturer')
```
```{r, fig.height=10,fig.width=8}
my_df <- mpg |> mutate(year=factor(year), cyl=factor(cyl))
plot_cycle_cols(df = my_df, X='cty',Y='hwy', myLabel = 'manufacturer', colour_vec = my_features)
```
<br>
# Colour scales
Creating and modifying colour scales can be hard work in ggplot2. These functions help to print the HEX codes and display the swatch for the selected colours, from default ggplot2 or RColorBrewer palettes.
<br>
## default_GG_col()
```{r}
default_GG_col(12)
```
<br>
## brewer_GG_col()
First check out the palette information to see all of the available Brewer palettes.
```{r}
RColorBrewer::brewer.pal.info
```
```{r}
brewer_GG_col(6,'Blues')
brewer_GG_col(4,'Paired')
brewer_GG_col(4,'RdYlBu')
```
<br>
# Data views
<br>
## bighead()
The `utils::head()` function will print all column names which can flood the console. For large matrices in particular, its often useful to check the top left corner of the matrix. `bighead(n)` prints a square data frame of dimensions n X n.
```{r}
diamond_mat <- as.matrix(diamonds[sample(1000), ])
diamond_mat |> bighead()
```
NB this will return an 8x8 data frame as default when run in .R or .Rmd.
<br>
## print_all()
The default console output for tidyverse tables is to display 6 rows of data. Use `print_all()` to output the entire table in the console. This is useful for data frames of intermediate size (7-100 rows) instead of modifying `print()` or using `View()`.
```{r, eval=F}
mpg |> print_all()
```
NB Not run here. This will print the entire table to the console when called from the console, a .R or .Rmd file.
<br>
# Summary
We hope these functions are useful for making your daily R coding work quicker and easier! Please don't hesitate to modify for your own use or suggest updates through github.
```{r,include=F}
system('bash update_readme.sh')
```