-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy paththe-role-and-importance-of-curiosity-in-data-science.html
353 lines (320 loc) · 17.8 KB
/
the-role-and-importance-of-curiosity-in-data-science.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
<!DOCTYPE html>
<html lang="en">
<head>
<title>The Role and Importance of Curiosity in Data Science - Just Alfred</title>
<!-- Using the latest rendering mode for IE -->
<meta http-equiv="X-UA-Compatible" content="IE=edge">
<meta charset="utf-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<link rel="canonical" href="https://blog.justalfred.com/the-role-and-importance-of-curiosity-in-data-science.html">
<meta name="author" content="Just Alfred" />
<meta name="keywords" content="explaining,tech,thinking" />
<meta name="description" content="I keep hearing curiosity is important in data science. I wonder why…." />
<!-- Bootstrap -->
<link rel="stylesheet" href="https://blog.justalfred.com/theme/css/bootstrap.flatly.min.css" type="text/css"/>
<link href="https://blog.justalfred.com/theme/css/font-awesome.min.css" rel="stylesheet">
<link href="https://blog.justalfred.com/theme/css/pygments/native.css" rel="stylesheet">
<link href="https://blog.justalfred.com/theme/css/typogrify.css" rel="stylesheet">
<link rel="stylesheet" href="https://blog.justalfred.com/theme/css/style.css" type="text/css"/>
<link href="https://blog.justalfred.com/static/custom.css" rel="stylesheet">
<link href="https://blog.justalfred.com/feeds/all.rss.xml" type="application/rss+xml" rel="alternate"
title="Just Alfred RSS Feed"/>
</head>
<body>
<div class="navbar navbar-default navbar-fixed-top" role="navigation">
<div class="container">
<div class="navbar-header">
<button type="button" class="navbar-toggle" data-toggle="collapse" data-target=".navbar-ex1-collapse">
<span class="sr-only">Toggle navigation</span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
<span class="icon-bar"></span>
</button>
<a href="https://blog.justalfred.com/" class="navbar-brand">
Just Alfred </a>
</div>
<div class="collapse navbar-collapse navbar-ex1-collapse">
<ul class="nav navbar-nav">
<li><a href="https://blog.justalfred.com/pages/about-the-author.html">
About the Author
</a></li>
<li class="active">
<a href="https://blog.justalfred.com/category/posts.html">Posts</a>
</li>
</ul>
<ul class="nav navbar-nav navbar-right">
</ul>
</div>
<!-- /.navbar-collapse -->
</div>
</div> <!-- /.navbar -->
<!-- Banner -->
<!-- End Banner -->
<!-- Content Container -->
<div class="container">
<div class="row">
<div class="col-sm-9">
<section id="content">
<article>
<header class="page-header">
<h1>
<a href="https://blog.justalfred.com/the-role-and-importance-of-curiosity-in-data-science.html"
rel="bookmark"
title="Permalink to The Role and Importance of Curiosity in Data Science">
The Role and Importance of Curiosity in Data Science
</a>
</h1>
</header>
<div class="entry-content">
<div class="panel">
<div class="panel-body">
<footer class="post-info">
<span class="label label-default">Date</span>
<span class="published">
<i class="fa fa-calendar"></i><time datetime="2018-04-02T00:00:00-04:00"> (Mon) 2018-04-02</time>
</span>
<span class="label label-default">Tags</span>
<a href="https://blog.justalfred.com/tag/explaining.html">explaining</a>
/
<a href="https://blog.justalfred.com/tag/tech.html">tech</a>
/
<a href="https://blog.justalfred.com/tag/thinking.html">thinking</a>
</footer><!-- /.post-info --> </div>
</div>
<p>Curiosity: The urge to know more.
I gave a talk last year about the role of curiosity in the practice of data science,
and I share the bits I still like here.
Now, many of us in the field have acknowledged its importance to the point of screening for it in interviews.
It’s obvious to most that healthy curiosity will lead to
serendipitous discoveries and introduce us to new skills and knowledge.
But as per my last post on ambiguity, nothing is black and white.
How does curiosity help a data scientist? Can we be too curious? What are its downsides?</p>
<h1>Science</h1>
<p>To start, let’s consider how curiosity assists data science:</p>
<ol>
<li>The goal of science, including data science, is to create new <strong>knowledge</strong></li>
<li>Science is driven by <strong>hypotheses</strong></li>
<li>More hypotheses of greater variety lead to a more complete <strong>search</strong> of the space of possible knowledge</li>
<li><strong>Creativity</strong> drives the generation of new hypotheses</li>
<li><strong>Curiosity</strong> drives creativity and the testing of those hypotheses</li>
</ol>
<p>Therefore, greater curiosity will lead to more diverse hypotheses, and we will explore them faster.
This leads to a faster accumulation of knowledge.
Without curiosity, we can only follow established procedures for generating and testing hypotheses,
and we will be limited to what we already happen to know.</p>
<p>Curiosity about the task at hand helps mainly in the form of motivation.
That is, it drives the testing of hypotheses.
But the jewels, the creative hypotheses, are found when we explore seemingly unrelated territory.
This is where our minds expand.</p>
<h1>Serendipity</h1>
<p>One of the wonderful features of data science is its interdisciplinarity.
I have seen people apply such a range of disciplines to their work.
Those who come from adjacent fields such as statistics, <span class="caps">CS</span>,
or computational physics or the like can bring their experience to their practice in a direct way.
But those coming from further afield also bring important strengths.</p>
<p>Some may consider social science an adjacent field, but for some reason, we tend to associate physicists
(though not chemists, despite the significant overlap in their methods)
with data science more eagerly than social scientists.
Yet, most physicists work with the luxury of
controlled experiments, explanatory models, clean data, and simple, inviolable laws.
Social scientists have to deal with
uncertainty, messy data, greater difficulty acquiring data,
unknown sources of error, and the complexity of human experience.
This, in my mind, and spoken as a former physicist, makes them
<a href="https://cacm.acm.org/magazines/2018/3/225484-computational-social-science-computer-science-social-data/fulltext">especially well-equipped</a>
for the demands of most data science—moreso than physicists.</p>
<p>More distant, take philosophy.
Ethics and epistemology have been prominent subjects of philosophical discourse for millennia,
and they are absolutely relevant to the practice.
We <em>can</em>, but <em>ought</em> we?
Are our beliefs in our analyses justifiable? Philosophy teaches us how to think and ask questions.
It should be no surprise that fluency here would enhance a data scientist’s approach to problem solving.
Our colleague reading Marx may have a stronger sense of the contingency of labels and categories than us,
and that may lead them to conduct more thorough exploratory data analysis (<span class="caps">EDA</span>).</p>
<p>Or take art.
Art is about expression.
It often involves arousing particular affects through word, sound, shape, or color.
Storytelling is often a major part of data science.
Experience with art can inform how one presents data via argument, user interface, or data visualization.
You may find Agnes Martin, Josef Albers, or Mark Rothko confounding,
but someone moved by them will probably use color and line especially effectively in their visual storytelling.</p>
<div style="text-align:center">
<figure>
<a href="https://www.flickr.com/photos/jkannenberg/3538952506/in/photostream/">
<img src="https://farm3.staticflickr.com/2266/3538952506_0a828273ba_z_d.jpg" alt="Detail of a work by Martin showing regular, rectilinear pencil lines" width="400" height="300">
</a>
<a href="https://upload.wikimedia.org/wikipedia/en/2/20/Josef_Albers%27s_painting_%27Homage_to_the_Square%27%2C_1965.jpg">
<img src="https://upload.wikimedia.org/wikipedia/en/2/20/Josef_Albers%27s_painting_%27Homage_to_the_Square%27%2C_1965.jpg" alt="A painting by Albers with stark, nested squares of slightly different colors" width="300" height="300">
</a>
<figcaption>Martin, left. Alberts, right. If these intrigue you, you just might be a whiz at data viz.</figcaption>
</figure>
</div>
<p>I would love to work with someone experienced in theater acting.
I have a feeling that someone practiced in interpreting the world
through the eyes of a fictional character would ask very different and more pertinent questions
about users or a population than I would.</p>
<p>Whatever your background, hobbies, or momentary fascinations,
your history informs the way you see the world, and therefore the way you do data science.</p>
<h1>Practice</h1>
<p>How do we foster curiosity in practice?
I like to make sure people are checking assumptions, checking their reasoning,
poking for holes in their own and others’ analyses, and nurturing their interests outside work.
For my own problems, I often explain my problems to others.
We’ve all had the experience of getting not two minutes into an explanation before realizing a gaping hole.</p>
<p>I also like to lead with creativity.
Often it’s when we creatively expand a mundane problem into a field of possibilities that
people’s curiosity can perk up.
Aside from brainstorming/<a href="https://www.creativityatwork.com/2011/01/10/brainwriting/">brainwriting</a>,
some good prompts when research just starts or gets stuck are to ask:
“Is there an easier way to do this?”
“What would [Stakeholder] ask?””
“Are we sure we know what we’re optimizing for?”
“What’s the worst that can happen?”
“How will this help [Stakeholder]?”
You get the picture.</p>
<p>And of course, the simplest way to foster curiosity is to have a diverse team to begin with.
(And if all you have is “diversity of thought”, your team’s thoughts are not as diverse as you think.)
We want there to be discord and tension in a team along with the maturity not to see these as signs of dysfunction.
This tension will perk our ears to other ways of thinking, other vantages, unknown unknowns.
It’s akin to adjusting the dial away from exploitation towards exploration,
so that our teams can sway and not break under the vicissitudes of business.</p>
<h1>Risks</h1>
<p>As I hinted, curiosity can go too far.
Especially if we forget to connect things back to our work.
It’s easy for curiosity to lead to distraction.
Even if our wanderings remain related to our task, we might find ourselves down so many rabbit holes that
we forget our actual commitments.
There is always the risk of “analysis paralysis” if
we come up with so many questions and hypotheses that we’re unable to make a decision.
Then we may miss critical windows of opportunity.
To guard against these patterns, I recommend checking in frequently with stakeholders and subject matter experts.
They can often eliminate entire categories of hypotheses based on their needs or experience.
Also, though uncomfortable, meeting more frequently than we can iterate can help
save time by forcing feedback before we go too far down a bad path.</p>
<p>As a practical matter, curious exploration often looks like wanton scribbles.
It’s easy to end up with a huge tangle of uncommented code in a Jupyter notebook
that fails the moment we restart our kernel.
We like the output of one cell, but we can’t get back to the state that generated it.
Communication and reproducibility are key to reliable science,
so we must try to keep our work traceable, even when we think we’re just toying around.
The best solution will depend on a team’s culture of sharing work, code, findings, and documentation.
(A musing: I was taught since college some rules for maintaining a useful lab notebook.
Date every page.
If I write something down wrong, cross it out so I can still read it and note why and when I did so.
Start each experiment by noting what I’m trying to do and expecting.
Stuff like that.
I wonder if a similar set of guidelines for <span class="caps">EDA</span> notebooks would be worth stating.)</p>
<p>There also is the sin of hunting for low p-values across many different statistical tests over the same data.
I argue that curiosity is valuable for generating more hypotheses, yes.
We need to test these hypotheses, yes.
We have a finite reserve of data, yes.
But if we test every hypothesis on the same data,
then by chance some tests will confess spurious patterns or hide actual ones.
The more we interrogate a data set, the less we can trust the results.
If you must do this, be very careful with the results, and consider an approach to mitigate the risks.
I refer you to your local statistician for further guidance.</p>
<h1>The Right Questions</h1>
<p>There are good reasons that data science in particular thrives on curiosity.
I’ve shared some ways to prompt it when you feel you need more.
But curiosity is not an absolute virtue.
It can lead to distraction.
It can motivate sloppiness.
And without the support of powers of synthesis, empathy, some domain knowledge, and grit,
it may actually spin us in spirals away from our goals.
In the end, it comes back to the hardest skill to learn in data science: asking the right questions.</p>
</div>
<!-- /.entry-content -->
</article>
</section>
</div>
<div class="col-sm-3" id="sidebar">
<aside>
<!-- Sidebar -->
<section class="well well-sm">
<ul class="list-group list-group-flush">
<!-- Sidebar/Social -->
<li class="list-group-item">
<h4><i class="fa fa-home fa-lg"></i><span class="icon-label">Social</span></h4>
<ul class="list-group" id="social">
<li class="list-group-item"><a href="https://twitter.com/Alphrabet"><i class="fa fa-twitter-square fa-lg"></i> Twitter</a></li>
<li class="list-group-item"><a href="https://github.com/justalfred"><i class="fa fa-github-square fa-lg"></i> Github</a></li>
</ul>
</li>
<!-- End Sidebar/Social -->
<!-- Sidebar/Tag Cloud -->
<li class="list-group-item">
<a href="https://blog.justalfred.com/tags.html"><h4><i class="fa fa-tags fa-lg"></i><span class="icon-label">Tags</span></h4></a>
<ul class="list-group " id="tags">
<li class="list-group-item tag-1">
<a href="https://blog.justalfred.com/tag/music.html">music</a>
</li>
<li class="list-group-item tag-1">
<a href="https://blog.justalfred.com/tag/living.html">living</a>
</li>
<li class="list-group-item tag-1">
<a href="https://blog.justalfred.com/tag/explaining.html">explaining</a>
</li>
<li class="list-group-item tag-1">
<a href="https://blog.justalfred.com/tag/exploring.html">exploring</a>
</li>
<li class="list-group-item tag-1">
<a href="https://blog.justalfred.com/tag/thinking.html">thinking</a>
</li>
<li class="list-group-item tag-2">
<a href="https://blog.justalfred.com/tag/science.html">science</a>
</li>
<li class="list-group-item tag-2">
<a href="https://blog.justalfred.com/tag/tech.html">tech</a>
</li>
<li class="list-group-item tag-2">
<a href="https://blog.justalfred.com/tag/art.html">art</a>
</li>
<li class="list-group-item tag-4">
<a href="https://blog.justalfred.com/tag/bisr.html">BISR</a>
</li>
</ul>
</li>
<!-- End Sidebar/Tag Cloud -->
<!-- Sidebar/Links -->
<li class="list-group-item">
<h4><i class="fa fa-external-link-square fa-lg"></i><span class="icon-label">Links</span></h4>
<ul class="list-group" id="links">
<li class="list-group-item">
<a href="/archives.html" target="_blank">Archives</a>
</li>
<li class="list-group-item">
<a href="/feeds/all.rss.xml" target="_blank">RSS feed</a>
</li>
</ul>
</li>
<!-- End Sidebar/Links -->
</ul>
</section>
<!-- End Sidebar --> </aside>
</div>
</div>
</div>
<!-- End Content Container -->
<footer>
<div class="container">
<hr>
<div class="row">
<div class="col-xs-10">© 2023 Just Alfred
· Powered by <a href="https://github.com/getpelican/pelican-themes/tree/master/pelican-bootstrap3" target="_blank">pelican-bootstrap3</a>,
<a href="http://docs.getpelican.com/" target="_blank">Pelican</a>,
<a href="http://getbootstrap.com" target="_blank">Bootstrap</a> <p><small> <a rel="license" href="https://creativecommons.org/licenses/by-nc-nd/4.0/deed.en"><img alt="Creative Commons License" style="border-width:0" src="//i.creativecommons.org/l/by-nc-nd/4.0/80x15.png" /></a>
Content
licensed under a <a rel="license" href="https://creativecommons.org/licenses/by-nc-nd/4.0/deed.en">Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License</a>, except where indicated otherwise.
</small></p>
</div>
<div class="col-xs-2"><p class="pull-right"><i class="fa fa-arrow-up"></i> <a href="#">Back to top</a></p></div>
</div>
</div>
</footer>
<script src="https://blog.justalfred.com/theme/js/jquery.min.js"></script>
<!-- Include all compiled plugins (below), or include individual files as needed -->
<script src="https://blog.justalfred.com/theme/js/bootstrap.min.js"></script>
<!-- Enable responsive features in IE8 with Respond.js (https://github.com/scottjehl/Respond) -->
<script src="https://blog.justalfred.com/theme/js/respond.min.js"></script>
</body>
</html>