forked from Tazinho/Advanced-R-Solutions
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy path000_Backpage_old.Rmd
364 lines (234 loc) · 35.7 KB
/
000_Backpage_old.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
# Book Cover old
# Backpage
## Requirements/ Guidance from C&H
- First one or two paragraphs should present background, aims and scope, and what sets the book apart.
- Third (or second, if only one above) paragraph is actually a bulleted list of the key features (try to be as specific as possible).
- Final paragraph is a summary including audience (although this may feature at the beginning).
- If there is room, you can include a short author bio at the end.
## gemeinsames Proposal
> “I learned a lot working through their solutions — it's a great way to broaden and deepen your understanding of R. (I should probably go through it again...)”
- Greg Wilson | RStudio, PBC
<!-- Source: https://twitter.com/gvwilson/status/1119235643652157440 -->
### Paragraph 1/2
This book provides solutions to the exercises contained in the 2nd Edition of Advanced R (Wickham, 2019). Advanced R introduces programmers and R users to various aspects of modern R programming. Additionally, it includes many challenging exercises to internalize and expand the content.
This solutions manual will help you to make the most out of these exercises, and offers a convenient point of reference. The entire content has been carefully prepared and revised with the help of Hadley Wickham.
### specific key features
This book will support your study of the R language and
- provide worked solutions to all 284 exercises presented in Advanced R
- discuss implementation details and alternative solutions
- help you stay motivated and avoid getting stuck
- serves as a reference to check your own solutions
- summarise our own experience and learnings with the content
### summary and audience
This book can act a helpful resource for anyone interested in improving their understanding of programming and the R language through Advanced R.
# Author Bios (75-100 words long each)
### Malte Grosser
Malte Grosser is a business mathematician from Hamburg, who has been programming in R regularly since the beginning of his career. He is currently finishing his PhD on machine learning for stroke outcome prediction from medical images and develops solutions in business as a Data Scientist. Malte likes to promote and engage in the local R community and is the author of the snakecase package, which converts strings into arbitrary cases.
<!-- - His eagerness to solve problems at the root and continually improve his coding skill led him to "Advanced R" and from there to "Advanced R Solutions". -->
### Henning Bumann
As a psychologist and statistician Henning enjoys making sense of data and is motivated to build data-driven solutions that are both beautiful and meaningful. He prefers free programming tools to support effective and transparent collaboration. He is happy to be an active member of the Hamburg R community and considers learning to code one of the most fortunate junctions in his professional life. Away from the computer Henning likes to take it easy and spend time in nature and with family and friends.
### Hadley Wickham (from Adv R)
Hadley Wickham is Chief Scientist at RStudio, an Adjunct Professor at Stanford University and the University of Auckland, and a member of the R Foundation. He is the lead developer of the tidyverse, a collection of R packages, including ggplot2 and dplyr, designed to support data science. He is also the author of R for Data Science (with Garrett Grolemund), R Packages, and ggplot2: Elegant Graphics for Data Analysis.
# Maltes initiales Proposal für die Backpage
- First one or two paragraphs should present background, aims and scope, and what sets the book apart.
- Third (or second, if only one above) paragraph is actually a bulleted list of the key features (try to be as specific as possible).
- Final paragraph is a summary including audience (although this may feature at the beginning).
- If there is room, you can include a short author bio at the end.
Do you want to deepen your R-skills and build up knowledge to master the application of R from the bottom up? Advanced R Solutions gives you a great boost to fully internalise Hadley Wickham's Advanced R and benefit enormously from it in your everyday work.
Advanced R provides programmers from all backgrounds with a comprehensive introduction into the world of professional R programming. It shows that, “despite its sometimes frustrating quirks, R is, at its heart, an elegant and beautiful language, well tailored for data science”. It gives you an entry to dive into the material yourself with almost 300(!) exercises, which are provided throughout its chapters.
Advanced R Solutions supports you on this journey with appropriately detailed solutions for each of these exercises. The entire content, including the R-code, has been carefully prepared and revised with the help of Hadley Wickham.
Working through the exercises and their solutions will give you a deep understanding of a variety of programming problems, many of which you might encounter again in your daily work. It will give you a great set of tools to solve these problems not only technically but also conceptually. In the process, you’ll developed a good sense of identifying meaningful measures to compare different approaches. While in some situations you will be able to transfer certain programming schemes directly, in many others far more elegant options might open up to you. As a positive side effect, your fluency in this language will notably increase.
# Meta data (for Advanced R and our book)
# Requirements:
# * 150-200 words per chapter
## 2. Names and values
In R, it is important to understand the distinction between an object and its name. Doing so will help you:
* More accurately predict the performance and memory usage of your code.
* Write faster code by avoiding accidental copies, a major source of slow code.
* Better understand R’s functional programming tools.
The goal of this chapter is to help you understand the distinction between names and values, and when R will copy an object.
## 3. Vectors
This chapter discusses the most important family of data types in base R: vectors. While you’ve probably already used many (if not all) of the different types of vectors, you may not have thought deeply about how they’re interrelated. In this chapter, I won’t cover individual vector types in too much detail, but I will show you how all the types fit together as a whole. If you need more details, you can find them in R’s documentation.
## 4. Subsetting
R’s subsetting operators are fast and powerful. Mastering them allows you to succinctly perform complex operations in a way that few other languages can match. Subsetting in R is easy to learn but hard to master because you need to internalise a number of interrelated concepts:
* There are six ways to subset atomic vectors.
* There are three subsetting operators, [[, [, and $.
* Subsetting operators interact differently with different vector types (e.g., atomic vectors, lists, factors, matrices, and data frames).
* Subsetting can be combined with assignment.
## 5. Control Flow
There are two primary tools of control flow: choices and loops. Choices, like if statements and switch() calls, allow you to run different code depending on the input. Loops, like for and while, allow you to repeatedly run code, typically with changing options. I’d expect that you’re already familiar with the basics of these functions so I’ll briefly cover some technical details and then introduce some useful, but lesser known, features.
The condition system (messages, warnings, and errors), which you’ll learn about in Chapter 8, also provides non-local control flow.
## 6. Functions
If you’re reading this book, you’ve probably already created many R functions and know how to use them to reduce duplication in your code. In this chapter, you’ll learn how to turn that informal, working knowledge into more rigorous, theoretical understanding. And while you’ll see some interesting tricks and techniques along the way, keep in mind that what you’ll learn here will be important for understanding the more advanced topics discussed later in the book.
## 7. Environments
The environment is the data structure that powers scoping. This chapter dives deep into environments, describing their structure in depth, and using them to improve your understanding of the four scoping rules described in Section 6.4. Understanding environments is not necessary for day-to-day use of R. But they are important to understand because they power many important R features like lexical scoping, namespaces, and R6 classes, and interact with evaluation to give you powerful tools for making domain specific languages, like dplyr and ggplot2.
## 8. Conditions
The condition system provides a paired set of tools that allow the author of a function to indicate that something unusual is happening, and the user of that function to deal with it. The function author signals conditions with functions like stop() (for errors), warning() (for warnings), and message() (for messages), then the function user can handle them with functions like tryCatch() and withCallingHandlers(). Understanding the condition system is important because you’ll often need to play both roles: signalling conditions from the functions you create, and handle conditions signalled by the functions you call.
R offers a very powerful condition system based on ideas from Common Lisp. Like R’s approach to object-oriented programming, it is rather different to currently popular programming languages so it is easy to misunderstand, and there has been relatively little written about how to use it effectively. Historically, this has meant that few people (myself included) have taken full advantage of its power. The goal of this chapter is to remedy that situation. Here you will learn about the big ideas of R’s condition system, as well as learning a bunch of practical tools that will make your code stronger.
I found two resources particularly useful when writing this chapter. You may also want to read them if you want to learn more about the inspirations and motivations for the system:
A prototype of a condition system for R by Robert Gentleman and Luke Tierney. This describes an early version of R’s condition system. While the implementation has changed somewhat since this document was written, it provides a good overview of how the pieces fit together, and some motivation for its design.
Beyond exception handling: conditions and restarts by Peter Seibel. This describes exception handling in Lisp, which happens to be very similar to R’s approach. It provides useful motivation and more sophisticated examples. I have provided an R translation of the chapter at http://adv-r.had.co.nz/beyond-exception-handling.html.
I also found it helpful to work through the underlying C code that implements these ideas. If you’re interested in understanding how it all works, you might find my notes to be useful.
## 9. Functionals
A functional is a function that takes a function as an input and returns a vector as output. Here’s a simple functional: it calls the function provided as input with 1000 random uniform numbers.
randomise <- function(f) f(runif(1e3))
randomise(mean)
#> [1] 0.506
randomise(mean)
#> [1] 0.501
randomise(sum)
#> [1] 489
The chances are that you’ve already used a functional. You might have used for-loop replacements like base R’s lapply(), apply(), and tapply(); or purrr’s map(); or maybe you’ve used a mathematical functional like integrate() or optim().
A common use of functionals is as an alternative to for loops. For loops have a bad rap in R because many people believe they are slow51, but the real downside of for loops is that they’re very flexible: a loop conveys that you’re iterating, but not what should be done with the results. Just as it’s better to use while than repeat, and it’s better to use for than while (Section 5.3.2), it’s better to use a functional than for. Each functional is tailored for a specific task, so when you recognise the functional you immediately know why it’s being used.
If you’re an experienced for loop user, switching to functionals is typically a pattern matching exercise. You look at the for loop and find a functional that matches the basic form. If one doesn’t exist, don’t try and torture an existing functional to fit the form you need. Instead, just leave it as a for loop! (Or once you’ve repeated the same loop two or more times, maybe think about writing your own functional).
Outline
This chapter will focus on functionals provided by the purrr package.52 These functions have a consistent interface that makes it easier to understand the key ideas than their base equivalents, which have grown organically over many years. I’ll compare and contrast base R functions as we go, and then wrap up the chapter with a discussion of base functionals that don’t have purrr equivalents.
## 10. Function factories
A function factory is a function that makes functions. Here’s a very simple example: we use a function factory (power1()) to make two child functions (square() and cube()):
power1 <- function(exp) {
function(x) {
x ^ exp
}
}
square <- power1(2)
cube <- power1(3)
Don’t worry if this doesn’t make sense yet, it should by the end of the chapter!
I’ll call square() and cube() manufactured functions, but this is just a term to ease communication with other humans: from R’s perspective they are no different to functions created any other way.
square(3)
#> [1] 9
cube(3)
#> [1] 27
You have already learned about the individual components that make function factories possible:
In Section 6.2.3, you learned about R’s first-class functions. In R, you bind a function to a name in the same way as you bind any object to a name: with <-.
In Section 7.4.2, you learned that a function captures (encloses) the environment in which it is created.
In Section 7.4.4, you learned that a function creates a new execution environment every time it is run. This environment is usually ephemeral, but here it becomes the enclosing environment of the manufactured function.
In this chapter, you’ll learn how the non-obvious combination of these three features leads to the function factory. You’ll also see examples of their usage in visualisation and statistics.
Of the three main functional programming tools (functionals, function factories, and function operators), function factories are the least used. Generally, they don’t tend to reduce overall code complexity but instead partition complexity into more easily digested chunks. Function factories are also an important building block for the very useful function operators, which you’ll learn about in Chapter 11.
## 11. Function operators
In this chapter, you’ll learn about function operators. A function operator is a function that takes one (or more) functions as input and returns a function as output. The following code shows a simple function operator, chatty(). It wraps a function, making a new function that prints out its first argument. You might create a function like this because it gives you a window to see how functionals, like map_int(), work.
chatty <- function(f) {
force(f)
function(x, ...) {
res <- f(x, ...)
cat("Processing ", x, "\n", sep = "")
res
}
}
f <- function(x) x ^ 2
s <- c(3, 2, 1)
purrr::map_dbl(s, chatty(f))
#> Processing 3
#> Processing 2
#> Processing 1
#> [1] 9 4 1
Function operators are closely related to function factories; indeed they’re just a function factory that takes a function as input. Like factories, there’s nothing you can’t do without them, but they often allow you to factor out complexity in order to make your code more readable and reusable.
Function operators are typically paired with functionals. If you’re using a for-loop, there’s rarely a reason to use a function operator, as it will make your code more complex for little gain.
If you’re familiar with Python, decorators is just another name for function operators.
## 12. Base types
To talk about objects and OOP in R we first need to clear up a fundamental confusion about two uses of the word “object”. So far in this book, we’ve used the word in the general sense captured by John Chambers’ pithy quote: “Everything that exists in R is an object”. However, while everything is an object, not everything is object-oriented. This confusion arises because the base objects come from S, and were developed before anyone thought that S might need an OOP system. The tools and nomenclature evolved organically over many years without a single guiding principle.
Most of the time, the distinction between objects and object-oriented objects is not important. But here we need to get into the nitty gritty details so we’ll use the terms base objects and OO objects to distinguish them.
Outline
## 13. S3
S3 is R’s first and simplest OO system. S3 is informal and ad hoc, but there is a certain elegance in its minimalism: you can’t take away any part of it and still have a useful OO system. For these reasons, you should use it, unless you have a compelling reason to do otherwise. S3 is the only OO system used in the base and stats packages, and it’s the most commonly used system in CRAN packages.
S3 is very flexible, which means it allows you to do things that are quite ill-advised. If you’re coming from a strict environment like Java this will seem pretty frightening, but it gives R programmers a tremendous amount of freedom. It may be very difficult to prevent people from doing something you don’t want them to do, but your users will never be held back because there is something you haven’t implemented yet. Since S3 has few built-in constraints, the key to its successful use is applying the constraints yourself. This chapter will therefore teach you the conventions you should (almost) always follow.
The goal of this chapter is to show you how the S3 system works, not how to use it effectively to create new classes and generics. I’d recommend coupling the theoretical knowledge from this chapter with the practical knowledge encoded in the vctrs package.
## 14. R6
This chapter describes the R6 OOP system. R6 has two special properties:
It uses the encapsulated OOP paradigm, which means that methods belong to objects, not generics, and you call them like object$method().
R6 objects are mutable, which means that they are modified in place, and hence have reference semantics.
If you’ve learned OOP in another programming language, it’s likely that R6 will feel very natural, and you’ll be inclined to prefer it over S3. Resist the temptation to follow the path of least resistance: in most cases R6 will lead you to non-idiomatic R code. We’ll come back to this theme in Section 16.3.
R6 is very similar to a base OOP system called reference classes, or RC for short. I describe why I teach R6 and not RC in Section 14.5.
## 15. S4
S4 provides a formal approach to functional OOP. The underlying ideas are similar to S3 (the topic of Chapter 13), but implementation is much stricter and makes use of specialised functions for creating classes (setClass()), generics (setGeneric()), and methods (setMethod()). Additionally, S4 provides both multiple inheritance (i.e. a class can have multiple parents) and multiple dispatch (i.e. method dispatch can use the class of multiple arguments).
An important new component of S4 is the slot, a named component of the object that is accessed using the specialised subsetting operator @ (pronounced at). The set of slots, and their classes, forms an important part of the definition of an S4 class.
## 16. Trade-offs
You now know about the three most important OOP toolkits available in R. Now that you understand their basic operation and the principles that underlie them, we can start to compare and contrast the systems in order to understand their strengths and weaknesses. This will help you pick the system that is most likely to solve new problems.
Overall, when picking an OO system, I recommend that you default to S3. S3 is simple, and widely used throughout base R and CRAN. While it’s far from perfect, its idiosyncrasies are well understood and there are known approaches to overcome most shortcomings. If you have an existing background in programming you are likely to lean towards R6, because it will feel familiar. I think you should resist this tendency for two reasons. Firstly, if you use R6 it’s very easy to create a non-idiomatic API that will feel very odd to native R users, and will have surprising pain points because of the reference semantics. Secondly, if you stick to R6, you’ll lose out on learning a new way of thinking about OOP that gives you a new set of tools for solving problems.
## 17. Big picture
Metaprogramming is the hardest topic in this book because it brings together many formerly unrelated topics and forces you grapple with issues that you probably haven’t thought about before. You’ll also need to learn a lot of new vocabulary, and at first it will seem like every new term is defined by three other terms that you haven’t heard of. Even if you’re an experienced programmer in another language, your existing skills are unlikely to be much help as few modern popular languages expose the level of metaprogramming that R provides. So don’t be surprised if you’re frustrated or confused at first; this is a natural part of the process that happens to everyone!
But I think it’s easier to learn metaprogramming now than ever before. Over the last few years, the theory and practice have matured substantially, providing a strong foundation paired with tools that allow you to solve common problems. In this chapter, you’ll get the big picture of all the main pieces and how they fit together.
## 18. Expressions
To compute on the language, we first need to understand its structure. That requires some new vocabulary, some new tools, and some new ways of thinking about R code. The first of these is the distinction between an operation and its result. Take the following code, which multiplies a variable x by 10 and saves the result to a new variable called y. It doesn’t work because we haven’t defined a variable called x:
y <- x * 10
#> Error in eval(expr, envir, enclos): object 'x' not found
It would be nice if we could capture the intent of the code without executing it. In other words, how can we separate our description of the action from the action itself?
One way is to use rlang::expr():
z <- rlang::expr(y <- x * 10)
z
#> y <- x * 10
expr() returns an expression, an object that captures the structure of the code without evaluating it (i.e. running it). If you have an expression, you can evaluate it with base::eval():
x <- 4
eval(z)
y
#> [1] 40
The focus of this chapter is the data structures that underlie expressions. Mastering this knowledge will allow you to inspect and modify captured code, and to generate code with code. We’ll come back to expr() in Chapter 19, and to eval() in Chapter 20.
## 19. Quasiquotation
Now that you understand the tree structure of R code, it’s time to return to one of the fundamental ideas that make expr() and ast() work: quotation. In tidy evaluation, all quoting functions are actually quasiquoting functions because they also support unquoting. Where quotation is the act of capturing an unevaluated expression, unquotation is the ability to selectively evaluate parts of an otherwise quoted expression. Together, this is called quasiquotation. Quasiquotation makes it easy to create functions that combine code written by the function’s author with code written by the function’s user. This helps to solve a wide variety of challenging problems.
Quasiquotation is one of the three pillars of tidy evaluation. You’ll learn about the other two (quosures and the data mask) in Chapter 20. When used alone, quasiquotation is most useful for programming, particularly for generating code. But when it’s combined with the other techniques, tidy evaluation becomes a powerful tool for data analysis.
## 20. Evaluation
The user-facing inverse of quotation is unquotation: it gives the user the ability to selectively evaluate parts of an otherwise quoted argument. The developer-facing complement of quotation is evaluation: this gives the developer the ability to evaluate quoted expressions in custom environments to achieve specific goals.
This chapter begins with a discussion of evaluation in its purest form. You’ll learn how eval() evaluates an expression in an environment, and then how it can be used to implement a number of important base R functions. Once you have the basics under your belt, you’ll learn extensions to evaluation that are needed for robustness. There are two big new ideas:
The quosure: a data structure that captures an expression along with its associated environment, as found in function arguments.
The data mask, which makes it easier to evaluate an expression in the context of a data frame. This introduces potential evaluation ambiguity which we’ll then resolve with data pronouns.
Together, quasiquotation, quosures, and data masks form what we call tidy evaluation, or tidy eval for short. Tidy eval provides a principled approach to non-standard evaluation that makes it possible to use such functions both interactively and embedded with other functions. Tidy evaluation is the most important practical implication of all this theory so we’ll spend a little time exploring the implications. The chapter finishes off with a discussion of the closest related approaches in base R, and how you can program around their drawbacks.
## 21. Translating R code
The combination of first-class environments, lexical scoping, and metaprogramming gives us a powerful toolkit for translating R code into other languages. One fully-fledged example of this idea is dbplyr, which powers the database backends for dplyr, allowing you to express data manipulation in R and automatically translate it into SQL. You can see the key idea in translate_sql() which takes R code and returns the equivalent SQL:
library(dbplyr)
translate_sql(x ^ 2)
#> <SQL> POWER(`x`, 2.0)
translate_sql(x < 5 & !is.na(x))
#> <SQL> `x` < 5.0 AND NOT(((`x`) IS NULL))
translate_sql(!first %in% c("John", "Roger", "Robert"))
#> <SQL> NOT(`first` IN ('John', 'Roger', 'Robert'))
translate_sql(select == 7)
#> <SQL> `select` = 7.0
Translating R to SQL is complex because of the many idiosyncrasies of SQL dialects, so here I’ll develop two simple, but useful, domain specific languages (DSL): one to generate HTML, and the other to generate mathematical equations in LaTeX.
If you’re interested in learning more about domain specific languages in general, I highly recommend Domain Specific Languages.106 It discusses many options for creating a DSL and provides many examples of different languages.
## 22. Debugging
What do you do when R code throws an unexpected error? What tools do you have to find and fix the problem? This chapter will teach you the art and science of debugging, starting with a general strategy, then following up with specific tools.
I’ll show the tools provided by both R and the RStudio IDE. I recommend using RStudio’s tools if possible, but I’ll also show you the equivalents that work everywhere. You may also want to refer to the official RStudio debugging documentation which always reflects the latest version of RStudio.
NB: You shouldn’t need to use these tools when writing new functions. If you find yourself using them frequently with new code, reconsider your approach. Instead of trying to write one big function all at once, work interactively on small pieces. If you start small, you can quickly identify why something doesn’t work, and don’t need sophisticated debugging tools.
## 23. Measuring performance
Before you can make your code faster, you first need to figure out what’s making it slow. This sounds easy, but it’s not. Even experienced programmers have a hard time identifying bottlenecks in their code. So instead of relying on your intuition, you should profile your code: measure the run-time of each line of code using realistic inputs.
Once you’ve identified bottlenecks you’ll need to carefully experiment with alternatives to find faster code that is still equivalent. In Chapter 24 you’ll learn a bunch of ways to speed up code, but first you need to learn how to microbenchmark so that you can precisely measure the difference in performance.
## 24. Improving performance
Once you’ve used profiling to identify a bottleneck, you need to make it faster. It’s difficult to provide general advice on improving performance, but I try my best with four techniques that can be applied in many situations. I’ll also suggest a general strategy for performance optimisation that helps ensure that your faster code is still correct.
It’s easy to get caught up in trying to remove all bottlenecks. Don’t! Your time is valuable and is better spent analysing your data, not eliminating possible inefficiencies in your code. Be pragmatic: don’t spend hours of your time to save seconds of computer time. To enforce this advice, you should set a goal time for your code and optimise only up to that goal. This means you will not eliminate all bottlenecks. Some you will not get to because you’ve met your goal. Others you may need to pass over and accept either because there is no quick and easy solution or because the code is already well optimised and no significant improvement is possible. Accept these possibilities and move on to the next candidate.
If you’d like to learn more about the performance characteristics of the R language, I’d highly recommend Evaluating the Design of the R Language.111 It draws conclusions by combining a modified R interpreter with a wide set of code found in the wild.
## 25. Rewriting R code in C++
Sometimes R code just isn’t fast enough. You’ve used profiling to figure out where your bottlenecks are, and you’ve done everything you can in R, but your code still isn’t fast enough. In this chapter you’ll learn how to improve performance by rewriting key functions in C++. This magic comes by way of the Rcpp package117 (with key contributions by Doug Bates, John Chambers, and JJ Allaire).
Rcpp makes it very simple to connect C++ to R. While it is possible to write C or Fortran code for use in R, it will be painful by comparison. Rcpp provides a clean, approachable API that lets you write high-performance code, insulated from R’s complex C API.
Typical bottlenecks that C++ can address include:
Loops that can’t be easily vectorised because subsequent iterations depend on previous ones.
Recursive functions, or problems which involve calling functions millions of times. The overhead of calling a function in C++ is much lower than in R.
Problems that require advanced data structures and algorithms that R doesn’t provide. Through the standard template library (STL), C++ has efficient implementations of many important data structures, from ordered maps to double-ended queues.
The aim of this chapter is to discuss only those aspects of C++ and Rcpp that are absolutely necessary to help you eliminate bottlenecks in your code. We won’t spend much time on advanced features like object-oriented programming or templates because the focus is on writing small, self-contained functions, not big programs. A working knowledge of C++ is helpful, but not essential. Many good tutorials and references are freely available, including http://www.learncpp.com/ and https://en.cppreference.com/w/cpp. For more advanced topics, the Effective C++ series by Scott Meyers is a popular choice.
## Next iteration:
## Backpage
> “I learned a lot working through their solutions — it's a great way to broaden and deepen your understanding of R. (I should probably go through it again...)”
- Greg Wilson | RStudio, PBC
<!-- Source: https://twitter.com/gvwilson/status/1119235643652157440 -->
This book offers solutions to the 284 exercises in Advanced R. It is based on our own efforts to work through Advanced R to learn the R programming language in an efficient and sustainable way. To this end, we have carefully documented all our solutions and made them as accessible and clear as possible. All solutions were carefully reviewed by Hadley Wickham, who also provided guidance when some exercises seemed unsolvable at first.
Working through the exercises and their solutions will give you a deep understanding of a variety of programming challenges, many of which are relevant to everyday work. This will expand your set of tools to solve these problems on a technical and conceptual level. You will develop your ability to identify and compare different solution approaches. In some situations you may be able to transfer a specific programming scheme directly, in others far more elegant alternatives may open up to you. As a positive side effect, your fluency in R will notably improve.
From a technical and R-specific perspective, you will learn about:
* When R makes copies, how this affects memory and performance, and how to address this.
* The characteristics of base R's vector data types and the implications.
* How to choose between R's different subsetting options.
* Everything you could ever want to know about functions in R.
* How to use rlang to work with environments and how to recurse over environments.
* The differences between calling and exiting handlers.
* How to employ functional programming to efficiently solve modular tasks involving, e.g. iterations or parametrizations.
* The mechanics, usage and limitations of R's commonly used and highly pragmatic S3 OO system.
* When and how to work with the R6 OO system, which takes a more classical and structured approach to OO programming and should feel very natural if you already know OO programming from other languages.
* Working with R's S4 OO system, which extends S3 with multiple dispatch and inheritance and much stricter formalities.
* How R parses expressions into abstract syntax trees and evaluates their elements.
* How you can modify these elements and even change their environments within this process.
* How to use this technique to create domain specific languages such as HTML or LaTeX purely based on systematic conversion of R code.
* How to identify performance bottlenecks via profiling and to compare different approaches via benchmarks.
* Different strategies for optimising R code, such as using pre-optimised functions, vectorisation, using solutions from e.g. algebra or simply reducing the scope of a function.
* Rewriting R code in C++ via the Rcpp package.
Malte Grosser is a business mathematician from Hamburg, who has been programming in R regularly since the beginning of his career. He is currently finishing his PhD on machine learning for stroke outcome prediction from medical images and develops solutions in business as a Data Scientist. Malte likes to promote and engage in the local R community and is the author of the snakecase package, which converts strings into arbitrary cases.
Henning Bumann is a psychologist and statistician who enjoys making sense of data and is motivated to build data-driven solutions that are beautiful and meaningful. He prefers free programming tools to support effective and transparent collaboration. He is happy to be an active member of the Hamburg R community and considers learning to code one of the most fortunate junctions in his professional life. Away from the computer Henning likes to take it easy and spend time in nature and with family and friends.
Hadley Wickham is the original author of Advanced R, created all the exercises and supported this project by reviewing all the proposed answers.
## (Or longer version from Advanced R)
Hadley Wickham is Chief Scientist at RStudio, an Adjunct Professor at Stanford University and the University of Auckland, and a member of the R Foundation. He is the lead developer of the tidyverse, a collection of R packages, including ggplot2 and dplyr, designed to support data science. He is also the author of R for Data Science (with Garrett Grolemund), R Packages, and ggplot2: Elegant Graphics for Data Analysis.