-
Notifications
You must be signed in to change notification settings - Fork 2
/
index.Rmd
819 lines (612 loc) · 42.1 KB
/
index.Rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
---
title : Economic Consulting
subtitle : Practicing Economics in the Hellabyte Era
author : James Lamb
job : Analyst | IHS Economics | Market Planning & Consulting
logo :
framework : io2012 # {io2012, html5slides, shower, dzslides, ...}
highlighter : highlight.js # {highlight.js, prettify, highlight}
hitheme : tomorrow #
widgets : [bootstrap] # {mathjax, quiz}
mode : selfcontained # {standalone, draft}
knit : slidify::knit2slides
---
<footer>
<hr></hr>
<span style="float:right"><FONT COLOR="#71787D" SIZE=3>Practicing Economics in the Hellabyte Era</FONT></span>
</footer>
<h2 style="color: #00C990">Disclaimer</h2>
</br></br>
<h3 style="color: #00C990">This presentation contains the personal commentary of the author. It does not reflect the views or opinions of IHS Inc.</h3>
--- &twocol
<footer>
<hr></hr>
<span style="float:right"><FONT COLOR="#71787D" SIZE=3>Practicing Economics in the Hellabyte Era</FONT></span>
</footer>
<h2 style="color: #00C990">Contents</h2>
*** =left
<FONT COLOR="#00C990" SIZE=5><b>I. Introduction</b></FONT>
<ol type="none">
<li><FONT COLOR="#71787D" SIZE=4>Personal Introduction</FONT><span style="float:right"><FONT COLOR="#71787D" SIZE=4>6</FONT></span> </li>
<li><FONT COLOR="#71787D" SIZE=4>The Hellabyte Era</FONT><span style="float:right"><FONT COLOR="#71787D" SIZE=4>7</FONT></span> </li>
<li><FONT COLOR="#71787D" SIZE=4>"Big Data" is Not the Whole Story</FONT><span style="float:right"><FONT COLOR="#71787D" SIZE=4>8</FONT></span> </li>
<li><FONT COLOR="#71787D" SIZE=4>Economics in the Age of Big Data</FONT><span style="float:right"><FONT COLOR="#71787D" SIZE=4>9</FONT></span> </li>
</ol>
<FONT COLOR="#00C990" SIZE=5><b>II. Reproducible Research</b></FONT>
<ol type="none">
<li><FONT COLOR="#71787D" SIZE=4>What is Reproducibility?</FONT><span style="float:right"><FONT COLOR="#71787D" SIZE=4>11</FONT></span> </li>
<li><FONT COLOR="#71787D" SIZE=4>Why Should I Care?</FONT><span style="float:right"><FONT COLOR="#71787D" SIZE=4>12</FONT></span> </li>
<li><FONT COLOR="#71787D" SIZE=4>Reproducibility Checklist</FONT><span style="float:right"><FONT COLOR="#71787D" SIZE=4>13</FONT></span> </li>
</ol>
<FONT COLOR="#00C990" SIZE=5><b>III. Programming Principles</b></FONT>
<ol type="none">
<li><FONT COLOR="#71787D" SIZE=4>Getting Started</FONT><span style="float:right"><FONT COLOR="#71787D" SIZE=4>15</FONT></span> </li>
</ol>
*** =right
<ol type="none">
<li><FONT COLOR="#71787D" SIZE=4>The Humble Programmer</FONT><span style="float:right"><FONT COLOR="#71787D" SIZE=4>16</FONT></span> </li>
<li><FONT COLOR="#71787D" SIZE=4>Introduction to Version Control</FONT><span style="float:right"><FONT COLOR="#71787D" SIZE=4>17</FONT></span> </li>
</ol>
<FONT COLOR="#00C990" SIZE=5><b>IV. Getting & Cleaning Data</b></FONT>
<ol type="none">
<li><FONT COLOR="#71787D" SIZE=4>Data Types/Sources</FONT><span style="float:right"><FONT COLOR="#71787D" SIZE=4>19</FONT></span> </li>
<li><FONT COLOR="#71787D" SIZE=4>Data Manipulation</FONT><span style="float:right"><FONT COLOR="#71787D" SIZE=4>20</FONT></span> </li>
</ol>
<FONT COLOR="#00C990" SIZE=5><b>V. Statistical Analysis</b></FONT>
<ol type="none">
<li><FONT COLOR="#71787D" SIZE=4>Tips & Tricks</FONT><span style="float:right"><FONT COLOR="#71787D" SIZE=4>22</FONT></span> </li>
<li><FONT COLOR="#71787D" SIZE=4>Making & Documenting Decisions</FONT><span style="float:right"><FONT COLOR="#71787D" SIZE=4>23</FONT></span> </li>
</ol>
<FONT COLOR="#00C990" SIZE=5><b>VI. Beautiful, Reproducible Output</b></FONT>
<ol type="none">
<li><FONT COLOR="#71787D" SIZE=4>Deliverable Options</FONT><span style="float:right"><FONT COLOR="#71787D" SIZE=4>25</FONT></span> </li>
<li><FONT COLOR="#71787D" SIZE=4>Static Graphics: ggplot2</FONT><span style="float:right"><FONT COLOR="#71787D" SIZE=4>26</FONT></span> </li>
</ol>
--- &twocol
<footer>
<hr></hr>
<span style="float:right"><FONT COLOR="#00C990" SIZE=3>Practicing Economics in the Hellabyte Era</FONT></span>
</footer>
<h2 style="color: #00C990">Contents</h2>
*** =left
<ol type="none">
<li><FONT COLOR="#71787D" SIZE=4>Animated Graphics: D3.js</FONT><span style="float:right"><FONT COLOR="#71787D" SIZE=4>27</FONT></span> </li>
<li><FONT COLOR="#71787D" SIZE=4>Interactive Graphics: googleVis</FONT><span style="float:right"><FONT COLOR="#71787D" SIZE=4>28</FONT></span> </li>
<li><FONT COLOR="#71787D" SIZE=4>Creating Deliverables with Code</FONT><span style="float:right"><FONT COLOR="#71787D" SIZE=4>29</FONT></span> </li>
</ol>
<FONT COLOR="#00C990" SIZE=5><b>VII. Putting it All Together</b></FONT>
<ol type="none">
<li><FONT COLOR="#71787D" SIZE=4>Literate Statistical Programming</FONT><span style="float:right"><FONT COLOR="#71787D" SIZE=4>31</FONT></span> </li>
<li><FONT COLOR="#71787D" SIZE=4>Multi-Software Solutions</FONT><span style="float:right"><FONT COLOR="#71787D" SIZE=4>32</FONT></span> </li>
<li><FONT COLOR="#71787D" SIZE=4>Collaboration with Git</FONT><span style="float:right"><FONT COLOR="#71787D" SIZE=4>33</FONT></span> </li>
<li><FONT COLOR="#71787D" SIZE=4>Case Study: Collaboration with Git</FONT><span style="float:right"><FONT COLOR="#71787D" SIZE=4>34</FONT></span> </li>
<li><FONT COLOR="#71787D" SIZE=4>The Checkpoint Approach</FONT><span style="float:right"><FONT COLOR="#71787D" SIZE=4>35</FONT></span> </li>
<li><FONT COLOR="#71787D" SIZE=4>Case Study: Checkpoint Approach</FONT><span style="float:right"><FONT COLOR="#71787D" SIZE=4>36</FONT></span> </li>
</ol>
*** =right
<FONT COLOR="#00C990" SIZE=4><b>VIII. Concluding Remarks</b></FONT>
<ol type="none">
<li><FONT COLOR="#71787D" SIZE=4>Summary</FONT><span style="float:right"><FONT COLOR="#71787D" SIZE=4>38</FONT></span> </li>
<li><FONT COLOR="#71787D" SIZE=4>Contact Information</FONT><span style="float:right"><FONT COLOR="#71787D" SIZE=4>39</FONT></span> </li>
</ol>
<FONT COLOR="#00C990" SIZE=4><b>Appendices</b></FONT>
<ol type="none">
<li><FONT COLOR="#71787D" SIZE=4>Training Resources</FONT><span style="float:right"><FONT COLOR="#71787D" SIZE=4>i</FONT></span> </li>
<li><FONT COLOR="#71787D" SIZE=4>Key Academic Papers</FONT><span style="float:right"><FONT COLOR="#71787D" SIZE=4>ii</FONT></span> </li>
</ol>
--- bg:#3C8C75;
<h2 style="color: #FFFFFF">Section I.</h2>
<hr></hr>
</br></br></br>
<h2 style="color: #FFFFFF">Introduction</h2>
--- bg:#FFFFFF;
<footer>
<hr></hr>
<span><FONT COLOR="#00C990" SIZE=3>● ○ ○ ○ ○ ○ ○ ○</FONT>
<FONT COLOR="#71787D" SIZE=3> I. Introduction</FONT></span><span style="float:right"><FONT COLOR="#71787D" SIZE=3>Practicing Economics in the Hellabyte Era</FONT></span>
</footer>
<h2 style="color: #00C990">Personal Introduction</h2>
>- <b>My Marquette Experience:</b>
- B.S., Economics & Marketing (2013)
- M.S.A.E., Marketing Research Specialization (2014)
>- <b>Since Marquette:</b>
- Analyst @ [IHS Economics](https://www.ihs.com/industry/economics-country-risk.html) in Lexington, MA
- Student in Johns Hopkins Data Science program via [Coursera](https://www.coursera.org/specialization/jhudatascience/1?utm_medium=listingPage)
>- <b>Research Interests:</b>
- IoT/IIoT --> Confluence of cybernetics, information theory, complex systems, economics, cognitive science. See [here](http://dspace.mit.edu/bitstream/handle/1721.1/86935/EQM%20_%20work%20in%20progress.pdf?sequence=135) for more
- Economic Complexity --> Just an observer for now. Guiding Projects: [*Complexity and the Economy*](https://global.oup.com/academic/product/complexity-and-the-economy-9780199334292?cc=us&lang=en&) | [Atlas of Economic Complexity](http://atlas.cid.harvard.edu/) | [Retail as a Complex System](http://www.epjdatascience.com/content/3/1/33)
--- bg:#FFFFFF;
<footer>
<hr></hr>
<span><FONT COLOR="#00C990" SIZE=3>● ○ ○ ○ ○ ○ ○ ○</FONT>
<FONT COLOR="#71787D" SIZE=3> I. Introduction</FONT></span><span style="float:right"><FONT COLOR="#71787D" SIZE=3>Practicing Economics in the Hellabyte Era</FONT></span>
</footer>
<h2 style="color: #00C990">The Hellabyte Era</h2>
>- <b>Andrew McAfee, [Washington Post](http://www.washingtonpost.com/blogs/innovations/wp/2013/10/25/welcome-to-the-hellabyte-era-as-in-a-helluva-lot-of-data/) 2013: </b>
> "We started by measuring data creation in kilobytes and megabytes and gigabytes and we are now at exabytes,
> zettabytes and yottabytes.
> ...
> Andrew McAfee and others have actually proposed that we settle on <b>hellabyte, as in 'helluva lot of data'</b>"
>- Drivers Behind the "[Data Deluge](http://www.economist.com/node/15579717)"
- Proliferation of [embedded systems](http://leeseshia.org/) (data creators)
- Resulting IP namespace explosion - [IPv6](http://securityintelligence.com/the-importance-of-ipv6-and-the-internet-of-things/#.VRFU4_nF_3c)
- Better tools for using high-dimension datasets:
- [Massive Parallelization](http://www.zdnet.com/article/mapreduce-and-mpp-two-sides-of-the-big-data-coin/) | [Machine Learning](http://en.wikipedia.org/wiki/Machine_learning) | [Distributed Storage](http://en.wikipedia.org/wiki/Apache_Hadoop)
- Cloud Computing (e.g. [AWS](http://aws.amazon.com/what-is-cloud-computing/)) --> Convert fixed cost of capacity to variable cost
--- bg:#FFFFFF;
<footer>
<hr></hr>
<span><FONT COLOR="#00C990" SIZE=3>● ○ ○ ○ ○ ○ ○ ○</FONT>
<FONT COLOR="#71787D" SIZE=3> I. Introduction</FONT></span><span style="float:right"><FONT COLOR="#71787D" SIZE=3>Practicing Economics in the Hellabyte Era</FONT></span>
</footer>
<h2 style="color: #00C990">"Big Data" is Not the Whole Story</h2>
<center><img src=".\\assets\\img\\dilbert_big_data.gif"></center>
<FONT SIZE=2> Image credit: Scott Adams, [May 07, 2008](http://dilbert.com/strip/2008-05-07)</FONT>
>- <b> Rich Data </b>
- [Berners-Lee (2014)](http://www.theguardian.com/technology/2014/oct/08/sir-tim-berners-lee-speaks-out-on-data-ownership): How we combine data is more important than how much we have
- Decision-making is context dependent --> We can rebuild context with [recombinant data](http://www.google.com/patents/US8768873)
- Imagine new transactions - [Varian (2014)](http://people.ischool.berkeley.edu/~hal/Papers/2013/BeyondBigDataPaperFINAL.pdf) --> reduction of information asymmetries
--- bg:#FFFFFF;
<footer>
<hr></hr>
<span><FONT COLOR="#00C990" SIZE=3>● ○ ○ ○ ○ ○ ○ ○</FONT>
<FONT COLOR="#71787D" SIZE=3> I. Introduction</FONT></span><span style="float:right"><FONT COLOR="#71787D" SIZE=3>Practicing Economics in the Hellabyte Era</FONT></span>
</footer>
<h2 style="color: #00C990">Economics in the Age of Big Data</h2>
**From [Einav & Levin (2014)](http://www.sciencemag.org/content/346/6210/1243089.full.pdf?keytype=ref&siteid=sci&ijkey=Jj7wCy7hhth4M):**
>- Economists increasingly using large data sets (private & administrative)
> "Economic models bring a simplifying conceptual framework to to help make sense of large data sets"
>- A major challenge:
> "...developing appropriate data management and programming capabilities, as well as designing creative
> and scalable approaches to summarize, describe, and analyze...data sets"
</br></br>
>- Other Commentary: [Einav (2013)](http://www.stanford.edu/~leinav/pubs/IPE2014.pdf) | [Varian (2013)](http://people.ischool.berkeley.edu/~hal/Papers/2013/BeyondBigDataPaperFINAL.pdf) | [Varian (2014)](http://people.ischool.berkeley.edu/~hal/Papers/2013/ml.pdf) | [Cagle (2014)](http://blogs.avalonconsult.com/blog/generic/ontology-for-fun-and-profit/)
--- bg:#3C8C75;
<h2 style="color: #FFFFFF">Section II.</h2>
<hr></hr>
</br></br></br>
<h2 style="color: #FFFFFF">Introduction to Reproducibility</h2>
--- bg:#FFFFFF;
<footer>
<hr></hr>
<span><FONT COLOR="#00C990" SIZE=3>● ● ○ ○ ○ ○ ○ ○</FONT>
<FONT COLOR="#71787D" SIZE=3> II. Reproducible Research</FONT></span><span style="float:right"><FONT COLOR="#71787D" SIZE=3>Practicing Economics in the Hellabyte Era</FONT></span>
</footer>
<h2 style="color: #00C990">What is Reproducibility?</h2>
>- NOT about replication in the empirical sense
- i.e. "Do other people with different data find similar results?"
>- More like:
- "If you give someone else your data (in its rawest form) and code, do they get the same results you presented?"
</br></br></br></br></br></br></br>
>- *Note: Much of the content in this section is adopted from ["Reproducible Research"](https://www.coursera.org/course/repdata), a MOOC from Johns Hopkins. See [here](https://github.com/jtleek/modules/tree/master/05_ReproducibleResearch) for more.*
--- bg:#FFFFFF;
<footer>
<hr></hr>
<span><FONT COLOR="#00C990" SIZE=3>● ● ○ ○ ○ ○ ○ ○</FONT>
<FONT COLOR="#71787D" SIZE=3> II. Reproducible Research</FONT></span><span style="float:right"><FONT COLOR="#71787D" SIZE=3>Practicing Economics in the Hellabyte Era</FONT></span>
</footer>
<h2 style="color: #00C990">Why Should I Care?</h2>
>- <b>[Clients are demanding](http://www.thebylinegroup.com/article6.html) --> they might change their minds many times</b>
* Need to make changes (including reproduction of deliverables) quickly
* Need an accurate project history
</br>
>- <b>Development requires testing</b>
* Need to be able to change inputs and see impact on the entire project environment
</br>
>- <b>Reproducibility begets clear thinking</b>
* The exercise will make you assess many dimensions of project work
* Participate in "big picture" thinking that might be lost in piecemeal efforts
--- &twocol bg:#FFFFFF;
<footer>
<hr></hr>
<span><FONT COLOR="#00C990" SIZE=3>● ● ○ ○ ○ ○ ○ ○</FONT>
<FONT COLOR="#71787D" SIZE=3> II. Reproducible Research</FONT></span><span style="float:right"><FONT COLOR="#71787D" SIZE=3>Practicing Economics in the Hellabyte Era</FONT></span>
</footer>
<h2 style="color: #00C990">Reproducibility Checklist</h2>
Guiding principles for [conducting reproducible analyses](https://github.com/jtleek/modules/blob/master/05_ReproducibleResearch/Checklist/index.md):
</br></br>
*** =left
<center><FONT COLOR="#B22222" size=6> DON'T </FONT></center>
<hr></hr>
- Save multiple file versions
- Manually edit spreadsheets
- Split/reformat data files
- Download data from a website
- **Point and click**
- Save output
- Document at the end
*** =right
<center><FONT COLOR="#00800" size=6> DO </FONT></center>
<hr></hr>
- Use version control
- Use code to manipulate data
- Keep raw data intact
- Write code to read in (if possible)
- **Use code**
- Save data + code for generating output
- Edit documentation as you go
--- bg:#3C8C75;
<h2 style="color: #FFFFFF">Section III.</h2>
<hr></hr>
</br></br></br>
<h2 style="color: #FFFFFF">Programming Principles</h2>
--- bg:#FFFFFF;
<footer>
<hr></hr>
<span><FONT COLOR="#00C990" SIZE=3>● ● ● ○ ○ ○ ○ ○</FONT>
<FONT COLOR="#71787D" SIZE=3> III. Programming Principles</FONT></span><span style="float:right"><FONT COLOR="#71787D" SIZE=3>Practicing Economics in the Hellabyte Era</FONT></span>
</footer>
<h2 style="color: #00C990">Getting Started</h2>
Reproducible economic research requires programming. There is no way around it. Here are a few lessons I've learned:
>- <b>Use a preamble</b>
- Your code should start with a set of key strings, scalars to be used throughout
- These might include file paths, mnemonic lists, samples (for time series)
>- <b>Keep it modular</b>
- Different functions should be accomplished by separate subroutines
- Facilitates trial-and-error testing; Improves readability of your code
>- <b>Use comments</b>
- Every programmer makes choices
- The code provides a record of these choices; Comments give a record of the decision-making process that led to them
--- bg:#FFFFFF;
<footer>
<hr></hr>
<span><FONT COLOR="#00C990" SIZE=3>● ● ● ○ ○ ○ ○ ○</FONT>
<FONT COLOR="#71787D" SIZE=3> III. Programming Principles</FONT></span><span style="float:right"><FONT COLOR="#71787D" SIZE=3>Practicing Economics in the Hellabyte Era</FONT></span>
</footer>
<h2 style="color: #00C990">The Humble Programmer</h2>
>- From <i>Djikstra's</i> (1972) famous talk, "The Humble Programmer".
>- Avoid putting clever tricks in your code just to prove how clever you are:
> "I suggest the we confine ourselves to ... intellectually manageable problems."
>- More parsimonious code is not necessarily desirable:
> "...one programmer places a one-line program on the desk of another and either he proudly tells
> what is does and adds the questions "Can you code this in less symbols?" - as if this were of
> any conceptual relevance!- or he just asks 'Guess what it does!' "
>- A given programming task can be approached many ways
- A commitment to "intellectually manageable" programs reduces the set of possible programs to choose from.
--- bg:#FFFFFF;
<footer>
<hr></hr>
<span><FONT COLOR="#00C990" SIZE=3>● ● ● ○ ○ ○ ○ ○</FONT>
<FONT COLOR="#71787D" SIZE=3> III. Programming Principles</FONT></span><span style="float:right"><FONT COLOR="#71787D" SIZE=3>Practicing Economics in the Hellabyte Era</FONT></span>
</footer>
<h2 style="color: #00C990">Introduction to Version Control</h2>
"Version control is a system that records changes to a file or set of files over time so that you can recall specific versions later." - [Git documentation](http://git-scm.com/book/en/v2/Getting-Started-About-Version-Control)
- The software: [Git](http://en.wikipedia.org/wiki/Git_%28software%29) | [GitHub](https://github.com/JayLamb20/MSAE_Alumni_2015/commits/gh-pages) (online extension)
- Distributed revision control and collaboration system
- Tracks project history, lets you revert back to old versions
- An example:
<center><img src=".\\assets\\img\\git_log.png"></center>
--- bg:#3C8C75;
<h2 style="color: #FFFFFF">Section IV.</h2>
<hr></hr>
</br></br></br>
<h2 style="color: #FFFFFF">Getting & Cleaning Data</h2>
--- bg:#FFFFFF;
<footer>
<hr></hr>
<span><FONT COLOR="#00C990" SIZE=3>● ● ● ● ○ ○ ○ ○</FONT>
<FONT COLOR="#71787D" SIZE=3> IV. Getting & Cleaning Data</FONT></span><span style="float:right"><FONT COLOR="#71787D" SIZE=3>Practicing Economics in the Hellabyte Era</FONT></span>
</footer>
<h2 style="color: #00C990">Data Types/Sources</h2>
Now we have the guiding motivation. Let's start building a real project! We begin, as always, with the data.
>- <b>Consider interopability</b>
- Proprietary (SAS, EViews, SPSS, MS Access)
- Non-proprietary (txt, csv)
>- <b>Consider the size of your data ([Varian (2013)](http://people.ischool.berkeley.edu/~hal/Papers/2013/BeyondBigDataPaperFINAL.pdf)):</b>
- Small-Medium (less than 1m observations) --> Spreadsheets
- Large (a few GB) --> MySQL, other relational DBs
- Very Large (more than a few GB) --> [NoSQL DBs](http://en.wikipedia.org/wiki/NoSQL), [HDFS](http://en.wikipedia.org/wiki/Apache_Hadoop#HDFS), [Cassandra](http://en.wikipedia.org/wiki/Apache_Cassandra)
>- <b>Consider diff-ability</b>
- Diff-able (able to track in Git): csv, txt, vbs, other text files
- Non-diff-able: [binary files](http://en.wikipedia.org/wiki/Binary_file); More on this [HERE](http://git-scm.com/book/en/v2/Customizing-Git-Git-Attributes)
--- bg:#FFFFFF;
<footer>
<hr></hr>
<span><FONT COLOR="#00C990" SIZE=3>● ● ● ● ○ ○ ○ ○</FONT>
<FONT COLOR="#71787D" SIZE=3> IV. Getting & Cleaning Data</FONT></span><span style="float:right"><FONT COLOR="#71787D" SIZE=3>Practicing Economics in the Hellabyte Era</FONT></span>
</footer>
<h2 style="color: #00C990">Data Manipulation</h2>
Not all data are tidy and clean.
>- <b>You will need to do some manipulation</b>
- Data scientists call this [data wrangling](http://git-scm.com/book/en/v2/Customizing-Git-Git-Attributes)
- Might be 50%-80% of project time - [NY Times (2014)](http://www.nytimes.com/2014/08/18/technology/for-big-data-scientists-hurdle-to-insights-is-janitor-work.html)
>- <b>Some suggestions</b>
- Don't edit data manually in Excel
- Use programming to make manipulations
>- <b>An example</b>
- You have observations of rural poverty levels from 2006-2014
- Need to make some assumptions to backcast to 1990
- Put assumptions (maybe a CAGR) in a program scalar, change it and see what the results look like
--- bg:#3C8C75;
<h2 style="color: #FFFFFF">Section V.</h2>
<hr></hr>
</br></br></br>
<h2 style="color: #FFFFFF">Statistical Analysis</h2>
--- bg:#FFFFFF;
<footer>
<hr></hr>
<span><FONT COLOR="#00C990" SIZE=3>● ● ● ● ● ○ ○ ○</FONT>
<FONT COLOR="#71787D" SIZE=3> V. Statistical Analysis</FONT></span><span style="float:right"><FONT COLOR="#71787D" SIZE=3>Practicing Economics in the Hellabyte Era</FONT></span>
</footer>
<h2 style="color: #00C990">Tips & Tricks</h2>
>- <b>Explicitly specify defaults</b>
- Many statistical software have algorithms with preset defaults
- Rather than call ```randomForest(x)``` in R, for example, specify the defaults of interest
- e.g. ```randomForest(x, ntree=1000, replace=TRUE)```
>- <b>Set your seed</b>
- Random number generators are actually "[pseudorandom generators](http://en.wikipedia.org/wiki/Random_seed)"
- The seed is the vector used to initialize the random-number generator
- If your work has any stochastic elements, someone else might get slightly different results as a result of using a different seed
>- <b>Consider the use case</b>
- Understand how your analyses will be used
- e.g. Multicollinearity matters more if clients can introduce exogenous shocks
--- bg:#FFFFFF;
<footer>
<hr></hr>
<span><FONT COLOR="#00C990" SIZE=3>● ● ● ● ● ○ ○ ○</FONT>
<FONT COLOR="#71787D" SIZE=3> V. Statistical Analysis</FONT></span><span style="float:right"><FONT COLOR="#71787D" SIZE=3>Practicing Economics in the Hellabyte Era</FONT></span>
</footer>
<h2 style="color: #00C990">Making & Documenting Decisions</h2>
>- <b>Assumptions are unavoidable</b>
- In the absence of data, we use heuristic decision-making - [Kahneman(2011)](http://en.wikipedia.org/wiki/Thinking,_Fast_and_Slow)
- When problems are not well-defined, we form our own hypotheses and use them to fill gaps in the data - [Arthur (2014)](https://global.oup.com/academic/product/complexity-and-the-economy-9780199334292?cc=us&lang=en&)
>- <b>Document these decisions</b>
- Store exogenously-set parameters in program scalars
- Link to academic papers, other commentary as justification
>- <b>Keep it simple</b>
- If a relationship is truly linear, using more complex models won't improve performance - [Pang-Ning et. al (2006)](http://www-users.cs.umn.edu/~kumar/dmbook/index.php)
- Where there exists no clear choice between competing alternatives, no shame in decisions which "minimize the sum of squared client questions"
--- bg:#3C8C75;
<h2 style="color: #FFFFFF">Section VI.</h2>
<hr></hr>
</br></br></br>
<h2 style="color: #FFFFFF">Beautiful, Reproducible Output</h2>
--- bg:#FFFFFF;
<footer>
<hr></hr>
<span><FONT COLOR="#00C990" SIZE=3>● ● ● ● ● ● ○ ○</FONT>
<FONT COLOR="#71787D" SIZE=3> VI. Beautiful, Reproducible Output</FONT></span><span style="float:right"><FONT COLOR="#71787D" SIZE=3>Practicing Economics in the Hellabyte Era</FONT></span>
</footer>
<h2 style="color: #00C990">Deliverable Options</h2>
You got the data, cleaned it, and analyzed it. But you can't give the client an EViews workfile or a folder full of R scripts. Now you make the real money...creating the deliverable.
>- <b>Formats</b>
- Reports (Word documents, pdf, custom web pages)
- Presentations ([slidify](http://slidify.org/), [RStudio Presenter](https://support.rstudio.com/hc/en-us/articles/200486468-Authoring-R-Presentations), [Beamer](http://en.wikipedia.org/wiki/Beamer_%28LaTeX%29), PowerPoint)
- Web Applications ([Shiny](http://www.shinyapps.io/), custom front-end + [yhat](https://yhathq.com/) background, [GitHub pages](https://pages.github.com/), custom web portals)
>- <b>Elements</b>
- Text (contextual, tied to the data)
- Static graphics (png, bmp) vs. Interactive ([rCharts](http://rcharts.io/), [GoogleVis](http://cran.r-project.org/web/packages/googleVis/vignettes/googleVis.pdf), [JavaScript D3.js](http://techslides.com/over-2000-d3-js-examples-and-demos))
--- &twocol bg:#FFFFFF;
<footer>
<hr></hr>
<span><FONT COLOR="#00C990" SIZE=3>● ● ● ● ● ● ○ ○</FONT>
<FONT COLOR="#71787D" SIZE=3> VI. Beautiful, Reproducible Output</FONT></span><span style="float:right"><FONT COLOR="#71787D" SIZE=3>Practicing Economics in the Hellabyte Era</FONT></span>
</footer>
<h2 style="color: #00C990">Static Graphics: ggplot2</h2>
*** =left
<img src=".\\assets\\img\\stacked_bar.png" height="400" width="550">
*** =right
<img src=".\\assets\\img\\facets_bar.png" height="400" width="550">
--- #myslide
<footer>
<hr></hr>
<span><FONT COLOR="#00C990" SIZE=3>● ● ● ● ● ● ○ ○</FONT>
<FONT COLOR="#71787D" SIZE=3> VI. Beautiful, Reproducible Output</FONT></span><span style="float:right"><FONT COLOR="#71787D" SIZE=3>Practicing Economics in the Hellabyte Era</FONT></span>
</footer>
<span><h2 style="color: #00C990">Animated Graphics: D3.js</h2> <FONT SIZE=2>Via bl.ocks.org: [About](http://bl.ocks.org/mbostock/raw/1256572/) | [Graph](http://bl.ocks.org/mbostock/1256572/)</FONT></span>
<script>
$('#myslide').on('slideenter', function(){
$(this).find('article')
.append('<iframe src="http://bl.ocks.org/mbostock/raw/1256572/"></iframe>')
});
$('#myslide').on('slideleave', function(){
$(this).find('iframe').remove();
});
</script>
---
<footer>
<hr></hr>
<span><FONT COLOR="#00C990" SIZE=3>● ● ● ● ● ● ○ ○</FONT>
<FONT COLOR="#71787D" SIZE=3> VI. Beautiful, Reproducible Output</FONT></span><span style="float:right"><FONT COLOR="#71787D" SIZE=3>Practicing Economics in the Hellabyte Era</FONT></span>
</footer>
<h2 style="color: #00C990">Interactive Graphics: googleVis</h2>
```{r googlevis1, results='asis',echo=FALSE}
suppressPackageStartupMessages(library(googleVis))
asylum <- read.table('http://dimiter.eu/Visualizations_files/asylum/asylum_data.txt')
myStateSet<-'
{"iconType":"BUBBLE","uniColorForNonSelected":false,"orderedByX":false,"playDuration":15000,"xZoomedIn":false,"xZoomedDataMin":5,"yZoomedIn":false,"xLambda":0,"time":"2001","orderedByY":false,"xZoomedDataMax":103080,"yLambda":0,"sizeOption":"2","nonSelectedAlpha":0.4,"colorOption":"_UNIQUE_COLOR","iconKeySettings":[{"key":{"dim0":"the Netherlands"},"trailStart":"2001"}],"xAxisOption":"2","yZoomedDataMax":0.7340301974,"duration":{"multiplier":1,"timeUnit":"Y"},"yAxisOption":"3","yZoomedDataMin":0,"dimensions":{"iconDimensions":["dim0"]},"showTrails":true}
'
c1 <- gvisMotionChart(asylum, idvar='Country', timevar='Year', options=list(
height=450,width=1000, state=myStateSet))
## slidify
print(c1, "chart")
```
<FONT SIZE=2>From ["Visualizing asylum policy"](http://www.dimiter.eu/Asylum.html) - [Dimiter Toshkov](http://www.dimiter.eu/Research.html)</FONT>
--- bg:#FFFFFF;
<footer>
<hr></hr>
<span><FONT COLOR="#00C990" SIZE=3>● ● ● ● ● ● ○ ○</FONT>
<FONT COLOR="#71787D" SIZE=3> VI. Beautiful, Reproducible Output</FONT></span><span style="float:right"><FONT COLOR="#71787D" SIZE=3>Practicing Economics in the Hellabyte Era</FONT></span>
</footer>
<h2 style="color: #00C990">Creating Deliverables with Code</h2>
When the data or statistical analyses change, your deliverables should change with them! Using code to create final output ensures that your deliverables are flexible and reproducible.
>- <b>HTML/CSS</b>
- Web pages, HTML slides (e.g. slidify), local HTML documents
- Relied upon by web developers, bloggers
>- <b>[LaTeX](http://www.latex-project.org/)</b>
- pdf reports, journal articles, glossaries ([makeglossary](http://en.wikibooks.org/wiki/LaTeX/Glossary)), bibliographies ([BibTex](http://en.wikibooks.org/wiki/LaTeX/Bibliography_Management#BibTeX))
- Used primarily by academics
>- <b>Markdown</b>
- Same output and logic as HTML, simpler syntax
- Can take raw HTML, LaTeX, CSS, JS for customizing certain parts
- R-specific version ([R Markdown](http://rmarkdown.rstudio.com/)) allows embedded, evaluated R code chunks
--- bg:#3C8C75;
<h2 style="color: #FFFFFF">Section VII.</h2>
<hr></hr>
</br></br></br>
<h2 style="color: #FFFFFF">Putting It All Together</h2>
--- bg:#FFFFFF;
<footer>
<hr></hr>
<span><FONT COLOR="#00C990" SIZE=3>● ● ● ● ● ● ● ○</FONT>
<FONT COLOR="#71787D" SIZE=3> VI. Beautiful, Reproducible Output</FONT></span><span style="float:right"><FONT COLOR="#71787D" SIZE=3>Practicing Economics in the Hellabyte Era</FONT></span>
</footer>
<h2 style="color: #00C990">Literate Statistical Programming</h2>
>- <b>Main Idea</b>
- Usual workflow: Do stuff to data, get the output, use it to create a final report
- LSP workflow: code and final text live in one document. Code is run when report is compiled.
- [Example video](https://www.youtube.com/watch?v=YcJb1HBc-1Q&t=18m15s) (Click me!)
>- <b>[Knuth (1992)](http://www.literateprogramming.com/index.html)</b>
> "The practitioner of literate programming can be regarded as an essayist, whose main concern is with exposition"
>- <b>Practical Applications</b>
- LSP offers a uniquely powerful method for authoring documentation.
- It is superior to commenting code (but don't stop commenting!)
--- bg:#FFFFFF;
<footer>
<hr></hr>
<span><FONT COLOR="#00C990" SIZE=3>● ● ● ● ● ● ● ○</FONT>
<FONT COLOR="#71787D" SIZE=3> VII. Putting it All Together</FONT></span><span style="float:right"><FONT COLOR="#71787D" SIZE=3>Practicing Economics in the Hellabyte Era</FONT></span>
</footer>
<h2 style="color: #00C990">Multi-Software Solutions</h2>
All software have strengths and weaknesses. You might, for example, want to do data wrangling in R, forecasting in EViews, and deliverable preparation in Excel. How can this be accomplished? A few options are given below.
>- <b>Read/write: "baton-passing"</b>
- Step 1 gets raw data, does stuff, exports to a csv
- Step 2 reads that csv (maybe in to EViews), does some stuff, dumps its output to Excel
- Step 3 pulls data from the FCST_DATA range in Excel, does stuff, creates deliverable
>- <b>An alternative: "quarterbacking"</b>
- Choose a single software to control the other software; e.g. [COM Automation in EViews](http://www.eviews.com/download/whitepapers/EViews_COM_Automation.pdf)
- Most software can pass commands directly to the Windows command line
- Store commands in a [VBScript file](http://en.wikipedia.org/wiki/VBScript) and execute it with a one-line command call
- Examples in: [R](https://stat.ethz.ch/R-manual/R-patched/library/base/html/system.html) | [Python](http://sarge.readthedocs.org/en/latest/) | [EViews](https://remote.bus.brocku.ca/files/Published_Resources/EViews_7/Docs/EViews%207%20Command%20Ref.pdf#105) (see "shell") | [MATLAB](http://blogs.mathworks.com/community/2010/05/17/calling-shell-commands-from-matlab/) | [SAS](http://support.sas.com/documentation/cdl/en/hostwin/63285/HTML/default/viewer.htm#exittemp.htm)
--- bg:#FFFFFF;
<footer>
<hr></hr>
<span><FONT COLOR="#00C990" SIZE=3>● ● ● ● ● ● ● ○</FONT>
<FONT COLOR="#71787D" SIZE=3> VII. Putting it All Together</FONT></span><span style="float:right"><FONT COLOR="#71787D" SIZE=3>Practicing Economics in the Hellabyte Era</FONT></span>
</footer>
<h2 style="color: #00C990">Collaboration with Git</h2>
>- <b>Put project files in a shared repo</b>
- Examples: GitHub, Google Drive, Dropbox, shared network (corporate setting), BitBucket, others
>- <b>Work locally</b>
- Each team member "[clones](http://git-scm.com/docs/git-clone)" the repo (i.e. makes a local copy)
- Testing and development are done locally, changes are "pushed" to the shared, central repo
>- <b>Advantages</b>
- Complete project history (with ability to revert to old versions)
- Multiple local copies of the repo minimizes risk of data loss
- Avoid unwieldy shared folders with many file versions
- Mitigated risk of over-writing or writing [conflicting code](http://githowto.com/resolving_conflicts)
--- bg:#FFFFFF;
<footer>
<hr></hr>
<span><FONT COLOR="#00C990" SIZE=3>● ● ● ● ● ● ● ○</FONT>
<FONT COLOR="#71787D" SIZE=3> VII. Putting it All Together</FONT></span><span style="float:right"><FONT COLOR="#71787D" SIZE=3>Practicing Economics in the Hellabyte Era</FONT></span>
</footer>
<h2 style="color: #00C990">Case Study: Collaboration with Git</h2>
<b>Use case:</b> You are working on a forecasting project for a large diversified manufacturer. The client wants country-level revenue forecasts for three divisions, each of which has 5 business units. You and three colleagues (lets call them Farrokh, Joe, and David) divvy up the work, with your colleagues taking responsibility for individual divisions, and you building the structure of the project (data manipulation, data banking, deliverable creation). Without version control, this is the result:
</br>
<span> <img src=".\\assets\\img\\version_control_data.png" height="500" width="400"></span><span style="float:right"><img src=".\\assets\\img\\version_control_prog.png" height="500" width="400"> </span>
--- bg:#FFFFFF;
<footer>
<hr></hr>
<span><FONT COLOR="#00C990" SIZE=3>● ● ● ● ● ● ● ○</FONT>
<FONT COLOR="#71787D" SIZE=3> VII. Putting it All Together</FONT></span><span style="float:right"><FONT COLOR="#71787D" SIZE=3>Practicing Economics in the Hellabyte Era</FONT></span>
</footer>
<h2 style="color: #00C990">The Checkpoint Approach</h2>
>- <b>Main Idea</b>
* Decide, early on, the format of your outputs. Tell your collaborators.
* Cooperation through shared commitment to consistent structures.
>- <b>Benefits</b>
* Different teams can work in parallel on their pieces
* Mitigate the threat of "breaking everything", reduce time spent retrofitting
>- <b>An Analogy</b>
* Kellogg's engineers might make changes to the Rice Krispies formula to make them sweeter or crunchier...but the result will always be a dry solid in a rectangular box.
* Partners downstream (e.g. retailers) can make improvements to their processes (e.g. inventory management, automated checkouts) with confidence that these improvements will always be compatible with changes from Kellogg's
--- bg:#FFFFFF;
<footer>
<hr></hr>
<span><FONT COLOR="#00C990" SIZE=3>● ● ● ● ● ● ● ○</FONT>
<FONT COLOR="#71787D" SIZE=3> VII. Putting it All Together</FONT></span><span style="float:right"><FONT COLOR="#71787D" SIZE=3>Practicing Economics in the Hellabyte Era</FONT></span>
</footer>
<h2 style="color: #00C990">Case Study: Checkpoint Approach</h2>
>- <b>The Setup</b>
* Team 1 (economists) uses SAS to pull data from the Bureau of Labor Statistics and generate forecasts at the census tract, state, and national level. These forecasts are exported to ```forecast.csv```
* Team 2 (analysts) uses R to import ```forecast.csv``` and create graphs and summary tables. These are stored in folder called ```Assets```
* Team 3 (consultants) uses VBA PowerPoint to pull in the figures from ```Assets``` and, with pre-made slide templates, generate a slide deck which can be compiled into a pdf to be delivered to the client.
>- <b>What Do You Notice?</b>
* Freezing the output/input formats strategically lets teams work in isolation without breaking each others' processes
--- bg:#3C8C75;
<h2 style="color: #FFFFFF">Section VIII.</h2>
<hr></hr>
</br></br></br>
<h2 style="color: #FFFFFF">Concluding Remarks</h2>
--- bg:#FFFFFF;
<footer>
<hr></hr>
<span><FONT COLOR="#00C990" SIZE=3>● ● ● ● ● ● ● ●</FONT>
<FONT COLOR="#71787D" SIZE=3> VIII. Concluding Remarks</FONT></span><span style="float:right"><FONT COLOR="#71787D" SIZE=3>Practicing Economics in the Hellabyte Era</FONT></span>
</footer>
<h2 style="color: #00C990">In Summary...</h2>
>- <b>Economics is Changing</b>
- Complexity problems are coming into focus
- The nature of transactions and information flow is changing rapidly
- New and richer data sources are available
>- <b>Economists Should Change Too</b>
- Learn to code...not just the stats, but deliverable creation too
- Reproducibility is critical
- Don't change with your clients. Change before them.
>- <b>This Could be Really Fun</b>
- You can do [this](http://www.brookings.edu/research/reports2/2014/11/06-mapping-freight-tomer-kane#%2EVFuHgBW0-Dq%2E)
- And [this](http://www.nytimes.com/2013/07/22/business/in-climbing-income-ladder-location-matters.html?pagewanted=all&_r=0) and even [this](http://atlas.cid.harvard.edu/)
--- bg:#FFFFFF;
<footer>
<hr></hr>
<span><FONT COLOR="#00C990" SIZE=3>● ● ● ● ● ● ● ●</FONT>
<FONT COLOR="#71787D" SIZE=3> VIII. Concluding Remarks</FONT></span><span style="float:right"><FONT COLOR="#71787D" SIZE=3>Practicing Economics in the Hellabyte Era</FONT></span>
</footer>
<h2 style="color: #00C990">Thank You for Your Time</h2>
</br>
<center><b>Not sure what you said, but the slides looked nice!</b></center>
</br>
<center>Umm thanks? You can click through the slides [here](http://jaylamb20.github.io/MSAE_Alumni_2015/index.html#1) or view the raw code [here](https://github.com/JayLamb20/MSAE_Alumni_2015).
</br>
<center><b>Questions? Comments? Profanity-Laced Criticisms?</b></center>
</br>
<center>[[email protected]]([email protected]) | [Twitter](https://twitter.com/i/notifications) | [LinkedIN](https://www.linkedin.com/in/jameslamb1) | [GitHub](https://github.com/JayLamb20)</center>
</br>
<center><b>Want to Work with me at IHS? Or pay my team to forecast stuff?</b></center>
<br>
<center>[[email protected]]([email protected])</center>
--- bg:#3C8C75;
<h2 style="color: #FFFFFF">Appendices</h2>
<hr></hr>
--- bg:#FFFFFF;
<footer>
<hr></hr>
<FONT COLOR="#71787D" SIZE=3>Appendices</FONT></span><span style="float:right"><FONT COLOR="#71787D" SIZE=3>Practicing Economics in the Hellabyte Era</FONT></span>
</footer>
<h2 style="color: #00C990">Appendix A. Training Resources</h2>
The list below is given only as a starting point. There are a LOT of resources out there. Go forth and explore!
</br>
- <b>General Data Science</b> - [JHU MOOC Specialization](https://www.coursera.org/specialization/jhudatascience/1?utm_medium=listingPage) | [edX](https://www.edx.org/) | [kaggle](http://www.kaggle.com/) | ["Open Source DS M.S"](http://datasciencemasters.org/)
- <b>Git/GitHub</b> - [Code School](https://www.codeschool.com/paths/git) | [GitHub-Recommend Sources](https://help.github.com/articles/good-resources-for-learning-git-and-github/)
- <b>EViews</b> - [Webinars](http://www.eviews.com/Training/webinars.html) | [Demo](http://register1.eviews.com/demo/) | [Forums](http://forums.eviews.com/)
- <b>Python</b> - [Rice U MOOCs](https://www.coursera.org/specialization/fundamentalscomputing2/37?utm_medium=listingPage) | [CodeAcademy](http://www.codecademy.com/tracks/python) | [LearnPython](http://www.learnpython.org/) | [UMich MOOC](https://www.coursera.org/course/pythonlearn)
- <b>R</b> - [DataCamp](https://www.datacamp.com/) | [CRAN](http://cran.r-project.org/) | [R Reference Card](http://cran.r-project.org/doc/contrib/Short-refcard.pdf)
- <b>HTML/CSS</b> - [CodeAcademy](http://www.codecademy.com/en/tracks/web) | [w3schools](http://www.w3schools.com/html/) | [CodeSchool](https://www.codeschool.com/paths/html-css)
- <b>LaTeX</b> - [WikiBooks](http://en.wikibooks.org/wiki/LaTeX) | [CTAN](https://www.ctan.org/) | [MiKTex Download](http://miktex.org/download)
--- bg:#FFFFFF;
<footer>
<hr></hr>
<FONT COLOR="#71787D" SIZE=3>Appendices</FONT></span><span style="float:right"><FONT COLOR="#71787D" SIZE=3>Practicing Economics in the Hellabyte Era</FONT></span>
</footer>
<h2 style="color: #00C990">Appendix B. Key Academic Papers</h2>
The resources listed here have shaped my view of the near-term future of economic thought and the economics profession. I hope you find them as interesting and useful as I did.
- [Arthur (2014)](https://global.oup.com/academic/product/complexity-and-the-economy-9780199334292?cc=us&lang=en&). *Complexity and the Economy*.
- [Cagle (2014)](http://blogs.avalonconsult.com/blog/generic/ontology-for-fun-and-profit/). Ontology for Fun and Profit.
- [Datta (2014)](http://dspace.mit.edu/handle/1721.1/86935). Future IoT.
- [Datta (n.d.)](http://dspace.mit.edu/bitstream/handle/1721.1/41897/WiFi%20Meet%20FuFi%20_%20MIT%20ESD%20WP.pdf?sequence=1). WiFi Meet FuFi: Disruptive Innovation in Logistics Catalysed by Energy.
- [Einav & Levin (2014)](http://www.sciencemag.org/content/346/6210/1243089.abstract). Economics in the age of big data.
- [Hausmann & Hidalgo, et. al (2011)](http://atlas.cid.harvard.edu/book/). *The Atlas of Economic Complexity*.
- [Kahneman (2013)](http://www.amazon.com/Thinking-Fast-Slow-Daniel-Kahneman/dp/0374533555). *Thinking, Fast and Slow*.
- [Varian (2014)](http://people.ischool.berkeley.edu/~hal/Papers/2013/BeyondBigDataPaperFINAL.pdf). Beyond Big Data.
- [Varian (2013)](http://people.ischool.berkeley.edu/~hal/Papers/2013/ml.pdf). Big Data: New Tricks for Econometrics.