forked from hadley/adv-r
-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathProfiling.rmd
45 lines (25 loc) · 2.87 KB
/
Profiling.rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
---
title: Profiling and benchmarking
layout: default
---
```{r, echo = FALSE}
options(digits = 3)
```
# Profiling and benchmarking
"We should forget about small efficiencies, say about 97% of the time: premature optimization is the root of all evil" --- Donald Knuth.
Your code should be correct, maintainable and fast. Notice that speed comes last - if your function is incorrect or unmaintainable (i.e. will eventually become incorrect) it doesn't matter if it's fast. As computers get faster and R is optimised, your code will get faster all by itself. Your code is never going to automatically become correct or elegant if it is not already.
That said, sometimes there are times where you need to make your code faster: spending several hours of your day might save days of computing time for others. The aim of this chapter is to give you the skills to figure out why your code is slow, what you can do to improve it, and ensure that you don't accidentally make it slow again in the future. You may already be familiar with `system.time`, which tells you how long a block of code takes to run. This is a useful building block, but is a crude tool.
Making fast code is a four part process:
1. Profiling helps you discover parts of your code are taking up the most time
2. Microbenchmarking lets you experiment with small parts of your code to find faster approaches.
3. Timing helps you check that the micro-optimisations have a macro effect, and helps experiment with larger changes (like totally rethinking your approach)
4. A performance testing tool makes sure your code stays fast in the future (e.g. [Vbench](http://wesmckinney.com/blog/?p=373))
Along the way, you'll also learn about the most common causes of poor performance in R, and how to address them. Sometimes there's no way to improve performance within R, and you'll need to use [[Rcpp]], the topic of the next chapter.
Having a good test suite is important when tuning the performance of your code: you don't want to make your code fast at the expense of making it incorrect. We won't discuss testing any further in this chapter, but we strongly recommend having a good set of test cases written before you begin optimisation.
Good exploration from Winston: http://rpubs.com/wch/3797
## Performance profiling
R provides a built in tool for profiling: `Rprof`. When active, this records the current call stack to disk very `interval` seconds. This provides a fine grained report showing how long each function takes. The function `summaryRprof` provides a way to turn this list of call stacks into useful information. But I don't think it's terribly useful, because it makes it hard to see the entire structure of the program at once. Instead, we'll use the `profr` package, which turns the call stack into a data.frame that is easier to manipulate and visualise.
Example showing how to use profr.
Sample pictures.
## Timing
## Performance testing