forked from swcarpentry/r-novice-gapminder
-
Notifications
You must be signed in to change notification settings - Fork 0
/
09-vectorisation.html
282 lines (282 loc) · 17.2 KB
/
09-vectorisation.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8">
<meta name="generator" content="pandoc">
<title>Software Carpentry: R for reproducible scientific analysis</title>
<link rel="shortcut icon" type="image/x-icon" href="/favicon.ico" />
<meta name="viewport" content="width=device-width, initial-scale=1.0" />
<link rel="stylesheet" type="text/css" href="css/bootstrap/bootstrap.css" />
<link rel="stylesheet" type="text/css" href="css/bootstrap/bootstrap-theme.css" />
<link rel="stylesheet" type="text/css" href="css/swc.css" />
<link rel="alternate" type="application/rss+xml" title="Software Carpentry Blog" href="http://software-carpentry.org/feed.xml"/>
<meta charset="UTF-8" />
<!-- HTML5 shim, for IE6-8 support of HTML5 elements -->
<!--[if lt IE 9]>
<script src="http://html5shim.googlecode.com/svn/trunk/html5.js"></script>
<![endif]-->
</head>
<body class="lesson">
<div class="container card">
<div class="banner">
<a href="http://software-carpentry.org" title="Software Carpentry">
<img alt="Software Carpentry banner" src="img/software-carpentry-banner.png" />
</a>
</div>
<article>
<div class="row">
<div class="col-md-10 col-md-offset-1">
<a href="index.html"><h1 class="title">R for reproducible scientific analysis</h1></a>
<h2 class="subtitle">Vectorisation</h2>
<section class="objectives panel panel-warning">
<div class="panel-heading">
<h2 id="learning-objectives"><span class="glyphicon glyphicon-certificate"></span>Learning Objectives</h2>
</div>
<div class="panel-body">
<ul>
<li>To understand vectorised operations in R.</li>
</ul>
</div>
</section>
<p>Most of R’s functions are vectorised, meaning that the function will operate on all elements of a vector without needing to loop through and act on each element one at a time. This makes writing code more concise, easy to read, and less error prone.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">x <-<span class="st"> </span><span class="dv">1</span>:<span class="dv">4</span>
x *<span class="st"> </span><span class="dv">2</span></code></pre></div>
<pre class="output"><code>[1] 2 4 6 8
</code></pre>
<p>The multiplication happened to each element of the vector.</p>
<p>We can also add two vectors together:</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">y <-<span class="st"> </span><span class="dv">6</span>:<span class="dv">9</span>
x +<span class="st"> </span>y</code></pre></div>
<pre class="output"><code>[1] 7 9 11 13
</code></pre>
<p>Each element of <code>x</code> was added to its corresponding element of <code>y</code>:</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">x:<span class="st"> </span><span class="dv">1</span> <span class="dv">2</span> <span class="dv">3</span> <span class="dv">4</span>
+<span class="st"> </span>+<span class="st"> </span>+<span class="st"> </span>+
y:<span class="st"> </span><span class="dv">6</span> <span class="dv">7</span> <span class="dv">8</span> <span class="dv">9</span>
---------------
<span class="st"> </span><span class="dv">7</span> <span class="dv">9</span> <span class="dv">11</span> <span class="dv">13</span></code></pre></div>
<section class="challenge panel panel-success">
<div class="panel-heading">
<h2 id="challenge-1"><span class="glyphicon glyphicon-pencil"></span>Challenge 1</h2>
</div>
<div class="panel-body">
<p>Let’s try this on the <code>pop</code> column of the <code>gapminder</code> dataset.</p>
<p>Make a new column in the <code>gapminder</code> data frame that contains population in units of millions of people. Check the head or tail of the data frame to make sure it worked.</p>
</div>
</section>
<section class="challenge panel panel-success">
<div class="panel-heading">
<h2 id="challenge-2"><span class="glyphicon glyphicon-pencil"></span>Challenge 2</h2>
</div>
<div class="panel-body">
<p>On a single graph, plot population, in millions, against year, for all countries. Don’t worry about identifying which country is which.</p>
<p>Repeat the exercise, graphing only for China, India, and Indonesia. Again, don’t worry about which is which.</p>
</div>
</section>
<p>Comparison operators, logical operators, and many functions are also vectorized:</p>
<p><strong>Comparison operators</strong></p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">x ><span class="st"> </span><span class="dv">2</span></code></pre></div>
<pre class="output"><code>[1] FALSE FALSE TRUE TRUE
</code></pre>
<p><strong>Logical operators</strong></p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">a <-<span class="st"> </span>x ><span class="st"> </span><span class="dv">3</span> <span class="co"># or, for clarity, a <- (x > 3)</span>
a</code></pre></div>
<pre class="output"><code>[1] FALSE FALSE FALSE TRUE
</code></pre>
<aside class="callout panel panel-info">
<div class="panel-heading">
<h2 id="tip-some-useful-functions-for-logical-vectors"><span class="glyphicon glyphicon-pushpin"></span>Tip: some useful functions for logical vectors</h2>
</div>
<div class="panel-body">
<p><code>any()</code> will return <code>TRUE</code> if <em>any</em> element of a vector is <code>TRUE</code> <code>all()</code> will return <code>TRUE</code> if <em>all</em> elements of a vector are <code>TRUE</code></p>
</div>
</aside>
<p>Most functions also operate element-wise on vectors:</p>
<p><strong>Functions</strong></p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">x <-<span class="st"> </span><span class="dv">1</span>:<span class="dv">4</span>
<span class="kw">log</span>(x)</code></pre></div>
<pre class="output"><code>[1] 0.0000000 0.6931472 1.0986123 1.3862944
</code></pre>
<p>Vectorised operations work element-wise on matrices:</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">m <-<span class="st"> </span><span class="kw">matrix</span>(<span class="dv">1</span>:<span class="dv">12</span>, <span class="dt">nrow=</span><span class="dv">3</span>, <span class="dt">ncol=</span><span class="dv">4</span>)
m *<span class="st"> </span>-<span class="dv">1</span> </code></pre></div>
<pre class="output"><code> [,1] [,2] [,3] [,4]
[1,] -1 -4 -7 -10
[2,] -2 -5 -8 -11
[3,] -3 -6 -9 -12
</code></pre>
<aside class="callout panel panel-info">
<div class="panel-heading">
<h2 id="tip-element-wise-vs.matrix-multiplication"><span class="glyphicon glyphicon-pushpin"></span>Tip: element-wise vs. matrix multiplication</h2>
</div>
<div class="panel-body">
<p>Very important: the operator <code>*</code> gives you element-wise multiplication! To do matrix multiplication, we need to use the <code>%*%</code> operator:</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">m %*%<span class="st"> </span><span class="kw">matrix</span>(<span class="dv">1</span>, <span class="dt">nrow=</span><span class="dv">4</span>, <span class="dt">ncol=</span><span class="dv">1</span>)</code></pre></div>
<pre class="output"><code> [,1]
[1,] 22
[2,] 26
[3,] 30
</code></pre>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">matrix</span>(<span class="dv">1</span>:<span class="dv">4</span>, <span class="dt">nrow=</span><span class="dv">1</span>) %*%<span class="st"> </span><span class="kw">matrix</span>(<span class="dv">1</span>:<span class="dv">4</span>, <span class="dt">ncol=</span><span class="dv">1</span>)</code></pre></div>
<pre class="output"><code> [,1]
[1,] 30
</code></pre>
<p>For more on matrix algebra, see the <a href="http://www.statmethods.net/advstats/matrix.html">Quick-R reference guide</a></p>
</div>
</aside>
<section class="challenge panel panel-success">
<div class="panel-heading">
<h2 id="challenge-3"><span class="glyphicon glyphicon-pencil"></span>Challenge 3</h2>
</div>
<div class="panel-body">
<p>Given the following matrix:</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">m <-<span class="st"> </span><span class="kw">matrix</span>(<span class="dv">1</span>:<span class="dv">12</span>, <span class="dt">nrow=</span><span class="dv">3</span>, <span class="dt">ncol=</span><span class="dv">4</span>)
m</code></pre></div>
<pre class="output"><code> [,1] [,2] [,3] [,4]
[1,] 1 4 7 10
[2,] 2 5 8 11
[3,] 3 6 9 12
</code></pre>
<p>Write down what you think will happen when you run:</p>
<ol style="list-style-type: decimal">
<li><code>m ^ -1</code></li>
<li><code>m * c(1, 0, -1)</code></li>
<li><code>m > c(0, 20)</code></li>
<li><code>m * c(1, 0, -1, 2)</code></li>
</ol>
<p>Did you get the output you expected? If not, ask a helper!</p>
</div>
</section>
<section class="challenge panel panel-success">
<div class="panel-heading">
<h2 id="challenge-4"><span class="glyphicon glyphicon-pencil"></span>Challenge 4</h2>
</div>
<div class="panel-body">
<p>We’re interested in looking at the sum of the following sequence of fractions:</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"> x =<span class="st"> </span><span class="dv">1</span>/(<span class="dv">1</span>^<span class="dv">2</span>) +<span class="st"> </span><span class="dv">1</span>/(<span class="dv">2</span>^<span class="dv">2</span>) +<span class="st"> </span><span class="dv">1</span>/(<span class="dv">3</span>^<span class="dv">2</span>) +<span class="st"> </span>... +<span class="st"> </span><span class="dv">1</span>/(n^<span class="dv">2</span>)</code></pre></div>
<p>This would be tedious to type out, and impossible for high values of n. Use vectorisation to compute x when n=100. What is the sum when n=10,000?</p>
</div>
</section>
<h2 id="challenge-solutions">Challenge solutions</h2>
<section class="challenge panel panel-success">
<div class="panel-heading">
<h2 id="solution-to-challenge-1"><span class="glyphicon glyphicon-pencil"></span>Solution to challenge 1</h2>
</div>
<div class="panel-body">
<p>Let’s try this on the <code>pop</code> column of the <code>gapminder</code> dataset.</p>
<p>Make a new column in the <code>gapminder</code> data frame that contains population in units of millions of people. Check the head or tail of the data frame to make sure it worked.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">gapminder$pop_millions <-<span class="st"> </span>gapminder$pop /<span class="st"> </span><span class="fl">1e6</span>
<span class="kw">head</span>(gapminder)</code></pre></div>
<pre class="output"><code> country year pop continent lifeExp gdpPercap pop_millions
1 Afghanistan 1952 8425333 Asia 28.801 779.4453 8.425333
2 Afghanistan 1957 9240934 Asia 30.332 820.8530 9.240934
3 Afghanistan 1962 10267083 Asia 31.997 853.1007 10.267083
4 Afghanistan 1967 11537966 Asia 34.020 836.1971 11.537966
5 Afghanistan 1972 13079460 Asia 36.088 739.9811 13.079460
6 Afghanistan 1977 14880372 Asia 38.438 786.1134 14.880372
</code></pre>
</div>
</section>
<section class="challenge panel panel-success">
<div class="panel-heading">
<h2 id="solution-to-challenge-2"><span class="glyphicon glyphicon-pencil"></span>Solution to challenge 2</h2>
</div>
<div class="panel-body">
<p>Refresh your plotting skills by plotting population in millions against year.</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">plot</span>(gapminder$year, gapminder$pop_millions)</code></pre></div>
<p><img src="fig/09-vectorisation-ch2-sol-1.png" title="plot of chunk ch2-sol" alt="plot of chunk ch2-sol" style="display: block; margin: auto;" /></p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">countryset <-<span class="st"> </span><span class="kw">c</span>(<span class="st">'China'</span>, <span class="st">'India'</span>, <span class="st">'Indonesia'</span>)
y <-<span class="st"> </span>gapminder[gapminder$country %in%<span class="st"> </span>countryset, ]
<span class="kw">plot</span>(y$year, y$pop_millions)</code></pre></div>
<p><img src="fig/09-vectorisation-ch2-sol-2.png" title="plot of chunk ch2-sol" alt="plot of chunk ch2-sol" style="display: block; margin: auto;" /></p>
</div>
</section>
<section class="challenge panel panel-success">
<div class="panel-heading">
<h2 id="solution-to-challenge-3"><span class="glyphicon glyphicon-pencil"></span>Solution to challenge 3</h2>
</div>
<div class="panel-body">
<p>Given the following matrix:</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">m <-<span class="st"> </span><span class="kw">matrix</span>(<span class="dv">1</span>:<span class="dv">12</span>, <span class="dt">nrow=</span><span class="dv">3</span>, <span class="dt">ncol=</span><span class="dv">4</span>)
m</code></pre></div>
<pre class="output"><code> [,1] [,2] [,3] [,4]
[1,] 1 4 7 10
[2,] 2 5 8 11
[3,] 3 6 9 12
</code></pre>
<p>Write down what you think will happen when you run:</p>
<ol style="list-style-type: decimal">
<li><code>m ^ -1</code></li>
</ol>
<pre class="output"><code> [,1] [,2] [,3] [,4]
[1,] 1.0000000 0.2500000 0.1428571 0.10000000
[2,] 0.5000000 0.2000000 0.1250000 0.09090909
[3,] 0.3333333 0.1666667 0.1111111 0.08333333
</code></pre>
<ol start="2" style="list-style-type: decimal">
<li><code>m * c(1, 0, -1)</code></li>
</ol>
<pre class="output"><code> [,1] [,2] [,3] [,4]
[1,] 1 4 7 10
[2,] 0 0 0 0
[3,] -3 -6 -9 -12
</code></pre>
<ol start="3" style="list-style-type: decimal">
<li><code>m > c(0, 20)</code></li>
</ol>
<pre class="output"><code> [,1] [,2] [,3] [,4]
[1,] TRUE FALSE TRUE FALSE
[2,] FALSE TRUE FALSE TRUE
[3,] TRUE FALSE TRUE FALSE
</code></pre>
</div>
</section>
<section class="challenge panel panel-success">
<div class="panel-heading">
<h2 id="challenge-4-1"><span class="glyphicon glyphicon-pencil"></span>Challenge 4</h2>
</div>
<div class="panel-body">
<p>We’re interested in looking at the sum of the following sequence of fractions:</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"> x =<span class="st"> </span><span class="dv">1</span>/(<span class="dv">1</span>^<span class="dv">2</span>) +<span class="st"> </span><span class="dv">1</span>/(<span class="dv">2</span>^<span class="dv">2</span>) +<span class="st"> </span><span class="dv">1</span>/(<span class="dv">3</span>^<span class="dv">2</span>) +<span class="st"> </span>... +<span class="st"> </span><span class="dv">1</span>/(n^<span class="dv">2</span>)</code></pre></div>
<p>This would be tedious to type out, and impossible for high values of n. Can you use vectorisation to compute x, when n=100? How about when n=10,000?</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">sum</span>(<span class="dv">1</span>/(<span class="dv">1</span>:<span class="dv">100</span>)^<span class="dv">2</span>)</code></pre></div>
<pre class="output"><code>[1] 1.634984
</code></pre>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">sum</span>(<span class="dv">1</span>/(<span class="dv">1</span>:<span class="fl">1e04</span>)^<span class="dv">2</span>)</code></pre></div>
<pre class="output"><code>[1] 1.644834
</code></pre>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">n <-<span class="st"> </span><span class="dv">10000</span>
<span class="kw">sum</span>(<span class="dv">1</span>/(<span class="dv">1</span>:n)^<span class="dv">2</span>)</code></pre></div>
<pre class="output"><code>[1] 1.644834
</code></pre>
<p>We can also obtain the same results using a function:</p>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">inverse_sum_of_squares <-<span class="st"> </span>function(n) {
<span class="kw">sum</span>(<span class="dv">1</span>/(<span class="dv">1</span>:n)^<span class="dv">2</span>)
}
<span class="kw">inverse_sum_of_squares</span>(<span class="dv">100</span>)</code></pre></div>
<pre class="output"><code>[1] 1.634984
</code></pre>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">inverse_sum_of_squares</span>(<span class="dv">10000</span>)</code></pre></div>
<pre class="output"><code>[1] 1.644834
</code></pre>
<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">n <-<span class="st"> </span><span class="dv">10000</span>
<span class="kw">inverse_sum_of_squares</span>(n)</code></pre></div>
<pre class="output"><code>[1] 1.644834
</code></pre>
</div>
</section>
</div>
</div>
</article>
<div class="footer">
<a class="label swc-blue-bg" href="http://software-carpentry.org">Software Carpentry</a>
<a class="label swc-blue-bg" href="https://github.com/swcarpentry/lesson-template">Source</a>
<a class="label swc-blue-bg" href="mailto:[email protected]">Contact</a>
<a class="label swc-blue-bg" href="LICENSE.html">License</a>
</div>
</div>
<!-- Javascript placed at the end of the document so the pages load faster -->
<script src="http://software-carpentry.org/v5/js/jquery-1.9.1.min.js"></script>
<script src="css/bootstrap/bootstrap-js/bootstrap.js"></script>
</body>
</html>