forked from SurgicalInformatics/healthyr_book
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path13_exporting.Rmd
397 lines (306 loc) · 16 KB
/
13_exporting.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
# Exporting and reporting{#chap13-h1}
\index{exporting@\textbf{exporting}}
> Without data, you are just another person with an opinion.
> W. Edwards Deming
The results of any data analysis are meaningless if they are not effectively communicated.
This may be as a journal article or presentation, or perhaps a regular report or webpage. In Chapter \@ref(chap13-h1) we emphasise another of the major strengths of R - the ease with which HTML (a web page), PDF, or Word documents can be generated.
The purpose of this chapter is to focus on the details of how to get your exported tables, plots and documents looking exactly the way you want them. There are many customisations that can be used, and we will only touch on a few of these.
We will generate a report using data already familiar to you from this book.
It will contain two tables - a demographics table and a regression table - and a plot.
We will use the `colon_s` data from the `finalfit` package.
What follows is for demonstration purposes and is not meant to illustrate model building.
For the purposes of the demonstration, we will ask, does a particular characteristic of a colon cancer (e.g., cancer differentiation) predict 5-year survival?
## Which format should I use?
The three common formats for exporting reports have different pros and cons:
* HTML is the least fussy to work with and can resize itself and its content automatically. For rapid exploration and prototyping, we recommend knitting to HTML. HTML documents can be attached to emails and viewed using any browser, even with no internet access (as long as it is a self-contained HTML document, which R Markdown exports usually are).
* PDF looks most professional when printed. This is because R Markdown uses LaTeX to typeset PDF documents. LaTeX PDFs are our preferred method of producing printable reports or dissertations, but they come with their own bag of issues. Mainly that LaTeX figures and tables *float* and may therefore appear much later down the document than the original text describing it was.
* Word is useful when working with non-R people who need to edit your output.
## Working in a `.R` file
We will demonstrate how you might put together a report in two ways.
First, we will show what you might do if you were working in standard R script file, then exporting certain objects only.
Second, we will talk about the approach if you were primarily working in a Notebook, which makes things easier.
We presume that the data have been cleaned carefully and the 'Get the data', 'Check the data', 'Data exploration' and 'Model building' steps have already been completed.
```{r echo=FALSE, message=FALSE}
library(knitr)
library(kableExtra)
mykable <- function(x, caption = "CAPTION", ...){
kable(x, row.names = FALSE, align = c("l", "l", "r", "r", "r", "r", "r", "r", "r"),
booktabs = TRUE, caption = caption,
linesep = c("", "", "\\addlinespace"), ...) %>%
kable_styling(latex_options = c("scale_down", "hold_position"))
}
```
## Demographics table
First, let's look at associations between our explanatory variable of interest (exposure) and other explanatory variables.
```{r, eval=FALSE}
library(tidyverse)
library(finalfit)
# Specify explanatory variables of interest
explanatory <- c("age", "sex.factor",
"extent.factor", "obstruct.factor",
"nodes")
colon_s %>%
summary_factorlist("differ.factor", explanatory,
p=TRUE, na_include=TRUE)
```
```{r, warning=FALSE, message=FALSE, echo=FALSE}
library(tidyverse)
library(finalfit)
# Specify explanatory variables of interest
explanatory <- c("age", "sex.factor",
"extent.factor", "obstruct.factor",
"nodes")
colon_s %>%
summary_factorlist("differ.factor", explanatory,
p=TRUE, na_include=TRUE) %>%
mykable(caption = "Exporting 'table 1': Tumour differentiation by patient and disease factors.")
```
Note that we include missing data in this table (see Chapter \@ref(chap11-h1)).
Also note that `nodes` has not been labelled properly.
In addition, there are small numbers in some variables generating `chisq.test()` warnings (expect fewer than 5 in any cell).
Now generate a final table.^[The `finalfit` functions used here - `summary_factorlist()` and `finalfit()` were introduced in Part II - Data Analysis. We will therefore not describe the different arguments here, we use them to demonstrate R's powers of exporting to fully formatted output documents.]
```{r, eval=FALSE}
colon_s <- colon_s %>%
mutate(
nodes = ff_label(nodes, "Lymph nodes involved")
)
table1 <- colon_s %>%
summary_factorlist("differ.factor", explanatory,
p=TRUE, na_include=TRUE,
add_dependent_label=TRUE,
dependent_label_prefix = "Exposure: "
)
table1
```
```{r, warning=FALSE, message=FALSE, echo=FALSE}
colon_s <- colon_s %>%
mutate(
nodes = ff_label(nodes, "Lymph nodes involved")
)
table1 <- colon_s %>%
summary_factorlist("differ.factor", explanatory,
p=TRUE, na_include=TRUE,
add_dependent_label=TRUE,
dependent_label_prefix = "Exposure: ")
table1 %>%
mykable(caption = "Exporting table 1: Adjusting labels and output.") %>%
column_spec(1, width = "3.5cm")
```
## Logistic regression table
After investigating the relationships between our explanatory variables, we will use logistic regression to include the outcome variable.
```{r, eval=FALSE}
explanatory <- c( "differ.factor", "age", "sex.factor",
"extent.factor", "obstruct.factor",
"nodes")
dependent <- "mort_5yr"
table2 <- colon_s %>%
finalfit(dependent, explanatory,
dependent_label_prefix = "")
table2
```
```{r, warning=FALSE, message=FALSE, echo=FALSE}
explanatory <- c( "differ.factor", "age", "sex.factor",
"extent.factor", "obstruct.factor",
"nodes")
dependent <- "mort_5yr"
table2 <- colon_s %>%
finalfit(dependent, explanatory,
dependent_label_prefix = "")
table2 %>%
mykable(caption = "Exporting a regression results table.")
```
## Odds ratio plot
It is often preferable to express the coefficients from a regression model as a forest plot.
For instance, a plot of odds ratios can be produced using the `or_plot()` function also from the `finalfit` package:
```{r fig.height=3.5, fig.width=7, message=FALSE, warning=FALSE, fig.cap="Odds ratio plot."}
colon_s %>%
or_plot(dependent, explanatory,
breaks = c(0.5, 1, 5, 10, 20, 30),
table_text_size = 3.5)
```
## MS Word via knitr/R Markdown
\index{Microsoft Word}
\index{PDF}
\index{knitr}
When moving from a `.R` file to a Markdown (`.Rmd`) file, environment objects such as tables or data frames / tibbles usually require to be saved and loaded to R Markdown document.
```{r, eval=FALSE}
# Save objects for knitr/markdown
save(table1, table2, dependent, explanatory,
file = here::here("data", "out.rda"))
```
In RStudio, select:
File > New File > R Markdown
A useful template file is produced by default. Try hitting knit to Word on the Knit button at the top of the `.Rmd` script window.
If you have difficulties at this stage, refer to Chapter \@ref(chap12-h1).
Now paste this into the file (we'll call it Example 1):
```` markdown
---
title: "Example knitr/R Markdown document"
author: "Your name"
date: "22/5/2020"
output:
word_document: default
---
`r ''````{r setup, include=FALSE}
# Load data into global environment.
library(finalfit)
library(dplyr)
library(knitr)
load(here::here("data", "out.rda"))
```
## Table 1 - Demographics
`r ''````{r table1, echo = FALSE}
kable(table1, row.names=FALSE, align=c("l", "l", "r", "r", "r", "r"))
```
## Table 2 - Association between tumour factors and 5 year mortality
`r ''````{r table2, echo = FALSE}
kable(table2, row.names=FALSE, align=c("l", "l", "r", "r", "r", "r"))
```
## Figure 1 - Association between tumour factors and 5 year mortality
`r ''````{r figure1, echo = FALSE}
explanatory = c( "differ.factor", "age", "sex.factor",
"extent.factor", "obstruct.factor",
"nodes")
dependent = "mort_5yr"
colon_s %>%
or_plot(dependent, explanatory)
```
````
```{r chap13-fig-word, echo = FALSE, fig.cap="Knitting to Microsoft Word from R Markdown. Before (A) and after (B) adjustment."}
knitr::include_graphics("images/chapter13/1_word_knit.png", auto_pdf = TRUE)
```
Knitting this into a Word document results in Figure \@ref(fig:chap13-fig-word)A), which looks pretty decent but some of the columns need some formatting and the plot needs resized.
Do not be tempted to do this by hand directly in the Word document.
Yes, before Markdown, we would have to move and format each table and figure directly in Word, and we would repeat this every time something changed.
Turns out some patient records were duplicated and you have to remove them before repeating the analysis over again.
Or your colleague forgot to attach an extra file with 10 more patients.
No problem, you update the dataset, re-run the script that created the tables and hit Knit in the R Markdown document.
No more mindless re-doing for you.
We think this is pretty amazing.
### Figure quality in Word output
If your plots are looking a bit grainy in Word, include this in your setup chunk for high quality:
```{r}
knitr::opts_chunk$set(dpi = 300)
```
The setup chunk is the one that starts with ```` ```{r setup, include = FALSE} ```` and is generated automatically when you create a new R Markdown document in RStudio.
## Create Word template file
To make sure tables always export with a suitable font size, you may edit your Word file but only to create a new template.
You will then use this template to Knit the R Markdown document again.
In the Word document the first example outputted, click on a table.
The style should be `compact`:
Right-click > Modify... > font size = 9
Alter heading and text styles in the same way as desired.
Save this as `colonTemplate.docx` (avoid underscores in the name of this file).
Move the file to your project folder and reference it in your `.Rmd` YAML header, as shown below.
Make sure you get the spacing correct, unlike R code, the YAML header is sensitive to formatting and the number of spaces at the beginning of the argument lines.
Finally, to get the figure printed in a size where the labels don't overlap each other, you will have to specify a width for it.
The Chunk cog introduced in the previous chapter is a convenient way to change the figure size (it is in the top-right corner of each grey code chunk in an R Markdown document).
It usually takes some experimentation to find the best size for each plot/output document; in this case we are going with `fig.width = 10`.
Knitting Example 2 here gives us Figure \@ref(fig:chap13-fig-word)B).
For something that is generated automatically, it looks awesome.
```` markdown
---
title: "Example knitr/R Markdown document"
author: "Your name"
date: "22/5/2020"
output:
word_document:
reference_docx: colonTemplate.docx
---
`r ''````{r setup, include=FALSE}
# Load data into global environment.
library(finalfit)
library(dplyr)
library(knitr)
load(here::here("data", "out.rda"))
```
## Table 1 - Demographics
`r ''````{r table1, echo = FALSE}
kable(table1, row.names=FALSE, align=c("l", "l", "r", "r", "r", "r"))
```
## Table 2 - Association between tumour factors and 5 year mortality
`r ''````{r table2, echo = FALSE}
kable(table2, row.names=FALSE, align=c("l", "l", "r", "r", "r", "r"))
```
## Figure 1 - Association between tumour factors and 5 year mortality
`r ''````{r figure1, echo=FALSE, message=FALSE, warning=FALSE, fig.width=10}
explanatory = c( "differ.factor", "age", "sex.factor",
"extent.factor", "obstruct.factor",
"nodes")
dependent = "mort_5yr"
colon_s %>%
or_plot(dependent, explanatory,
breaks = c(0.5, 1, 5, 10, 20, 30))
```
````
## PDF via knitr/R Markdown
Without changing anything in Example 1 and Knitting it into a PDF, we get \@ref(fig:chap13-fig-pdf)A.
Again, most of it already looks pretty good, but some parts over-run the page and the plot is not a good size.
We can fix the plot in exactly the same way we did for the Word version (`fig.width`), but the second table that is too wide needs some special handling.
For this we use `kable_styling(font_size=8)` from the `kableExtra` package.
Remember to install it when using for the first time, and include `library(knitExtra)` alongside the other library lines at the setup chunk.
We will also alter the margins of your page using the geometry option in the preamble as the default margins of a PDF document coming out of R Markdown are a bit wide for us.
```{r chap13-fig-pdf, echo = FALSE, fig.cap="Knitting to Microsoft Word from R Markdown. Before (A) and after (B) adjustment.", out.width="70%"}
knitr::include_graphics("images/chapter13/1_pdf_knit.png", auto_pdf = TRUE)
```
```` markdown
---
title: "Example knitr/R Markdown document"
author: "Your name"
date: "22/5/2020"
output:
pdf_document: default
geometry: margin=0.75in
---
`r ''````{r setup, include=FALSE}
# Load data into global environment.
library(finalfit)
library(dplyr)
library(knitr)
library(kableExtra)
load(here::here("data", "out.rda"))
```
## Table 1 - Demographics
`r ''````{r table1, echo = FALSE}
kable(table1, row.names=FALSE, align=c("l", "l", "r", "r", "r", "r"),
booktabs = TRUE)
```
## Table 2 - Association between tumour factors and 5 year mortality
`r ''````{r table2, echo = FALSE}
kable(table2, row.names=FALSE, align=c("l", "l", "r", "r", "r", "r"),
booktabs=TRUE) %>%
kable_styling(font_size=8)
```
## Figure 1 - Association between tumour factors and 5 year mortality
`r ''````{r figure1, echo=FALSE, message=FALSE, warning=FALSE, fig.width=10}
explanatory = c( "differ.factor", "age", "sex.factor",
"extent.factor", "obstruct.factor",
"nodes")
dependent = "mort_5yr"
colon_s %>%
or_plot(dependent, explanatory,
breaks = c(0.5, 1, 5, 10, 20, 30))
```
````
The result is shown in Figure \@ref(fig:chap13-fig-pdf)B.
## Working in a `.Rmd` file
We now perform almost all our analyses in a Notebook / Markdown file as described in the previous chapter.
This means running all analyses within the document, without the requirement to save and reload table or plot objects.
As mentioned earlier, a Notebook document can be rendered as a PDF or a Word document.
Some refining is usually needed to move from an 'analysis' document to a final 'report' document, but it is often minimal.
Figure \@ref(fig:chap13-fig-report) demonstrates a report-type document rendered as a PDF.
All the code is run within the document, but not included in the output (`echo=FALSE`).
```{r chap13-fig-report, echo = FALSE, fig.cap="Writing a final report in a Markdown document."}
knitr::include_graphics("images/chapter13/4_colon_report.png", auto_pdf = TRUE)
```
## Moving between formats
As we have shown, it is relatively straightforward to move between HTML, Word and PDF when documents are simple.
This becomes more difficult if you have a complicated document which includes lots of formatting.
For instance, if you use the package `kableExtra()` to customise your tables, you can only export to HTML and PDF.
Knitting to Word will not currently work with advanced `kableExtra` functions in your R Markdown document.
Similarly, `flextable` and `officer` are excellent packages for a love story between R Markdown and Word/MS Office, but they do not work for HTML or PDF.
## Summary
The combination of R, RStudio, and Markdown is a powerful triumvirate which produces beautiful results quickly and will be greatly labour saving.
We use this combination for all academic work, but also in the production of real-time reports such as webpages and downloadable PDFs for ongoing projects.
This is a fast-moving area with new applications and features appearing every month.
We would highly recommend you spend some time getting familiar with this area, as it will become an ever more important skill in the future.