forked from aws-samples/aws-mlu-explain
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathindex.html
668 lines (647 loc) · 30 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
<!DOCTYPE html>
<html>
<head>
<meta charset="utf-8" />
<meta http-equiv="X-UA-Compatible" content="IE=edge" />
<title>Decision Trees</title>
<meta
name="viewport"
content="width=device-width, initial-scale=1, maximum-scale=1, user-scalable=no"
/>
<meta
name="description"
content="MLU-Explain: Visual Introduction to Decision Trees."
/>
<meta
property="og:image"
content="https://mlu-explain.github.io/assets/ogimages/ogimage-decision-tree.png"
/>
<meta property="og:title" content="Decision Trees" />
<meta
property="og:description"
content="An introduction to the Decision Trees, Entropy, and Information Gain."
/>
<link rel="icon" href="./assets/mlu_robot.png" />
<!-- Global site tag (gtag.js) - Google Analytics -->
<script
async
src="https://www.googletagmanager.com/gtag/js?id=G-1FYW57GW3G"
></script>
<script>
window.dataLayer = window.dataLayer || [];
function gtag() {
dataLayer.push(arguments);
}
gtag("js", new Date());
gtag("config", "G-1FYW57GW3G");
</script>
<link rel="stylesheet" href="sass/main.scss" />
<link rel="stylesheet" href="css/katex.min.css" />
</head>
<body>
<div id="scrolly">
<article>
<section data-index="-1" id="title-section">
<section id="title">
<div id="intro-icon">
<a href="https://mlu-explain.github.io/"
><svg
width="50"
height="50"
viewBox="0 0 234 216"
fill="none"
xmlns="https://www.w3.org/2000/svg"
>
<g id="mlu_robot 1" clip-path="url(#clip0)">
<g>
<path
id="Vector"
d="M90.6641 83.1836C96.8828 83.1836 101.941 78.1289 101.941 71.8906V71.8242C101.941 65.5898 96.8945 60.5312 90.6641 60.5312C84.4453 60.5312 79.3828 65.5898 79.3828 71.8242V71.8906C79.3828 78.1289 84.4336 83.1836 90.6641 83.1836Z"
fill="white"
/>
<path
id="Vector_2"
d="M143.305 83.1836C149.523 83.1836 154.586 78.1289 154.586 71.8906V71.8242C154.586 65.5898 149.535 60.5312 143.305 60.5312C137.09 60.5312 132.027 65.5898 132.027 71.8242V71.8906C132.027 78.1289 137.078 83.1836 143.305 83.1836Z"
fill="white"
/>
<path
id="Vector_3"
d="M163.586 159.402H173.609V122.641H163.586V159.402Z"
fill="white"
/>
<path
id="Vector_4"
d="M60.3594 159.402H70.3867V122.641H60.3594V159.402Z"
fill="white"
/>
<g id="Group">
<path
id="Vector_5"
d="M182.16 30.0781H51.8047V10.0234H182.16V30.0781ZM182.16 103.609H51.8047V40.1055H182.16V103.609ZM144.559 168.789H89.4062V113.641H144.559V168.789ZM0 0V10.0234H15.8789V46.7891H25.9023V10.0234H41.7812V113.641H79.3867V178.816H96.9297V215.578H106.957V178.816H127.016V215.578H137.039V178.816H154.586V113.641H192.188V10.0234H233.969V0"
fill="white"
/>
</g>
</g>
</g>
<defs>
<clipPath id="clip0">
<rect width="233.97" height="215.58" fill="white" />
</clipPath>
</defs>
</svg>
</a>
<h2 class="logo">MLU-expl<span id="ai">AI</span>n</h2>
</div>
<h1>Decision Trees<br /></h1>
<p id="subtitle">
The unreasonable power of nested decision rules.
</p>
<p>
By
<a href="https://twitter.com/jdwlbr">Jared Wilber</a>
&
<a href="https://twitter.com/lusantala">Lucía Santamaría</a>
<br /><br /><br /><br />
</p>
</section>
</section>
<section data-index="0" id="intro">
<h2>Let's Build a Decision Tree</h2>
<p>
Let's pretend we're farmers with a new plot of land. Given only the
Diameter and Height of a tree trunk, we must determine if it's an
Apple, Cherry, or Oak tree. To do this, we'll use a Decision Tree.
</p>
</section>
<section data-index="1" id="startsplit">
<h2>Start Splitting</h2>
<p>
Almost every tree with a
<span class="bold">Diameter ≥ 0.45</span> is an Oak tree! Thus,
we can probably assume that any other trees we find in that region
will also be one. <br /><br />This first
<span class="bold">decision node</span> will act as our
<span class="bold">root node</span>. We'll draw a vertical line at
this Diameter and classify everything above it as Oak (our first
<span class="bold">leaf node</span>), and continue to partition our
remaining data on the left.
</p>
</section>
<section data-index="2" id="moresplit">
<h2>Split Some More</h2>
<p>
We continue along, hoping to split our plot of land in the most
favorable manner. We see that creating a new
<span class="bold">decision node</span> at
<span class="bold">Height ≤ 4.88</span> leads to a nice section
of Cherry trees, so we partition our data there. <br /><br />
Our Decision Tree updates accordingly, adding a new
<span class="bold">leaf node</span> for Cherry.
</p>
</section>
<section data-index="3" id="moremoresplit">
<h2>And Some More</h2>
<p>
After this second split we're left with an area containing many
Apple and some Cherry trees. No problem: a vertical division can be
drawn to separate the Apple trees a bit better.
<br /><br />Once again, our Decision Tree updates accordingly.
</p>
</section>
<section data-index="4" id="moremoremoresplit">
<h2>And Yet Some More</h2>
<p>
The remaining region just needs a further horizontal division and
boom - our job is done! We've obtained an optimal set of nested
decisions.
<br /><br />
That said, some regions still enclose a few misclassified points.
Should we continue splitting, partitioning into smaller sections?
<br /><br />
Hmm...
</p>
</section>
<section data-index="5" id="variance">
<h2>Don't Go Too Deep!</h2>
<p>
If we do, the resulting regions would start becoming increasingly
complex, and our tree would become unreasonably deep. Such a
Decision Tree would learn too much from the noise of the training
examples and not enough generalizable rules.
<br /><br />
Does this ring familiar? It is the well known tradeoff that we have
explored in our explainer on
<a href="https://mlu-explain.github.io/bias-variance/"
>The Bias Variance Tradeoff</a
>! In this case, going too deep results in a tree that
<span class="bold">overfits</span> our data, so we'll stop here.
<br /><br />
We're done! We can simply pass any new data point's
<span class="bold">Height</span> and
<span class="bold">Diameter</span> values through the newly created
Decision Tree to classify them as either an Apple, Cherry, or Oak
tree!
</p>
</section>
</article>
<div id="right">
<section id="intro-text">
<div id="intro-tree-chart"></div>
<p class="intro-text-mobile">
Decision Trees are
<span class="bold">supervised</span>
machine learning algorithms used for both regression and
classification problems. They're popular for their ease of
interpretation and large range of applications. Decision Trees
consist of a series of <span class="bold">decision nodes</span> on
some dataset's features, and make predictions at
<span class="bold">leaf nodes</span>. <br /><br />
Scroll on to learn more!
</p>
<p class="intro-text-desktop">
Decision Trees are widely used algorithms for
<span class="bold">supervised</span>
machine learning. They're popular for their ease of interpretation
and large range of applications. They work for both regression and
classification problems.
</p>
<p class="intro-text-desktop">
A Decision Tree consists of a series of sequential decisions, or
<span class="bold">decision nodes</span>, on some data set's
features. The resulting flow-like structure is navigated via
conditional control statements, or
<span class="bold">if-then</span> rules, which split each decision
node into two or more subnodes.
<span class="bold">Leaf nodes</span>, also known as terminal nodes,
represent prediction outputs for the model.
</p>
<p class="intro-text-desktop">
To <span class="bold">train</span> a Decision Tree from data means
to figure out the order in which the decisions should be assembled
from the root to the leaves. New data may then be passed from the
top down until reaching a leaf node, representing a prediction for
that data point.
</p>
</section>
<figure>
<div id="chart-wrapper">
<div id="chart"></div>
<div id="chart2"></div>
</div>
</figure>
</div>
</div>
<section id="splits">
<h2 class="center-text sectionheader">Where To Partition?</h2>
<p class="center-text">
We just saw how a Decision Tree operates at a high-level: from the top
down, it creates a series of sequential rules that split the data into
well-separated regions for classification. But given the large number of
potential options, how exactly does the algorithm determine where to
partition the data? Before we learn how that works, we need to
understand <span class="bold">Entropy</span>. <br /><br />
Entropy measures the amount of information of some variable or event.
We'll make use of it to identify regions consisting of a large number of
similar (pure) or dissimilar (impure) elements.
</p>
<p class="center-text">
Given a certain set of events that occur with probabilities
<span id="probs-equation"></span>, the total entropy
<span id="H-equation"></span> can be written as the negative sum of
weighted probabilities:
</p>
<span id="entropy-equation"></span>
<p class="center-text">
The quantity has a number of interesting properties:
</p>
<div class="boxed">
<h3>Entropy Properties</h3>
<ol class="numbered-list">
<li>
<span id="H-eq-0-equation"></span> only if all but one of the
<span id="one-prob-equation"></span> are zero, this one having the
value of 1. Thus the entropy vanishes only when there is no
uncertainty in the outcome, meaning that the sample is completely
unsurprising.
</li>
<li>
<span id="H-equation2"></span> is maximum when all the
<span id="one-prob-equation2"></span> are equal. This is the most
uncertain, or 'impure', situation.
</li>
<li>
Any change towards the equalization of the probabilities
<span id="probs-equation2"></span> increases
<span id="H-equation3"></span>.
</li>
</ol>
</div>
<p class="center-text">
The entropy can be used to quantify the
<span class="bold">impurity</span> of a collection of labeled data
points: a node containing multiple classes is impure whereas a node
including only one class is pure.
</p>
<div id="entropy-chart"></div>
<p class="center-text">
Above, you can compute the entropy of a collection of labeled data
points belonging to two classes, which is typical for
<span class="bold">binary classification</span> problems. Click on the
<span class="bold">Add</span> and
<span class="bold">Remove</span> buttons to modify the composition of
the bubble.
</p>
<p class="center-text">
Did you notice that pure samples have zero entropy whereas impure ones
have larger entropy values? This is what entropy is doing for us:
measuring how pure (or impure) a set of samples is. We'll use it in the
algorithm to train Decision Trees by defining the Information Gain.
</p>
</section>
<section id="informationgain">
<h3 class="center-text subheader">Information Gain</h3>
<p class="center-text">
With the intuition gained with the above animation, we can now describe
the logic to train Decision Trees. As the name implies, information gain
measures an amount the information that we gain. It does so using
entropy.
<span class="bold"
>The idea is to subtract from the entropy of our data before the split
the entropy of each possible partition thereafter</span
>. We then select the split that yields the largest reduction in
entropy, or equivalently, the largest increase in information.
<br /><br />
The core algorithm to calculate information gain is called
<a class="on-plum" href="https://en.wikipedia.org/wiki/ID3_algorithm"
>ID3</a
>. It's a recursive procedure that starts from the root node of the tree
and iterates top-down on all non-leaf branches in a greedy manner,
calculating at each depth the difference in entropy:
<br /><br />
</p>
<span id="ig-equation"></span>
<p class="center-text">
To be specific, the algorithm's steps are as follows:
</p>
<div class="boxed">
<h3>ID3 Algorithm Steps</h3>
<ol class="numbered-list">
<li>
Calculate the entropy associated to every feature of the data set.
</li>
<li>
Partition the data set into subsets using different features and
cutoff values. For each, compute the information gain
<span id="delta-ig-equation"></span> as the difference in entropy
before and after the split using the formula above. For the total
entropy of all children nodes after the split, use the weighted
average taking into account <span id="N-child-equation"></span>,
i.e. how many of the <span id="N-equation"></span> samples end up on
each child branch.
</li>
<li>
Identify the partition that leads to the maximum information gain.
Create a decision node on that feature and split value.
</li>
<li>
When no further splits can be done on a subset, create a leaf node
and label it with the most common class of the data points within it
if doing classification or with the average value if doing
regression.
</li>
<li>
Recurse on all subsets. Recursion stops if after a split all
elements in a child node are of the same type. Additional stopping
conditions may be imposed, such as requiring a minimum number of
samples per leaf to continue splitting, or finishing when the
trained tree has reached a given maximum depth.
</li>
</ol>
</div>
<p class="center-text">
Of course, reading the steps of an algorithm isn't always the most
intuitive thing. To make things easier to understand, let's revisit how
information gain was used to determine the first
<span class="bold">decision node</span> in our tree.
</p>
<div id="ig-container">
<div id="ig-text">
<!-- <h3>Our First Split</h3> -->
<p id="mobile-off">
Recall our first decision node split on
<span class="bold">Diameter ≤ 0.45</span>. How did we choose this
condition? It was the result of
<span class="bold">maximizing information gain</span>. <br /><br />
Each of the possible splits of the data on its two features (<span
class="bold"
>Diameter</span
>
and <span class="bold">Height</span>) and cutoff values yields a
different value of the information gain. <br /><br />
The line chart displays the different split values for the
<span class="bold">Diameter</span> feature.
<span class="bold">Move the decision boundary yourself</span> to see
how the data points in the top chart are assigned to the left or
right children nodes accordingly. On the bottom you can see the
corresponding entropy values of both children nodes as well as the
total information gain. <br /><br />
The ID3 algorithm will select the split point with the largest
information gain, shown as the peak of the black line in the bottom
chart of <span class="bold">0.574</span> at
<span class="bold">Diameter = 0.45</span>.
</p>
</div>
<div id="entropy-scatter-charts">
<div id="entropy-chart-scatter"></div>
<div id="entropy-chart-ig"></div>
</div>
</div>
<p id="mobile-on" class="center-text">
Recall our first decision node split on
<span class="bold">Diameter ≤ 0.45</span>. How did we choose this
condition? It was the result of
<span class="bold">maximizing information gain</span>. <br /><br />
Each of the possible splits of the data on its two features (<span
class="bold"
>Diameter</span
>
and <span class="bold">Height</span>) and cutoff values yields a
different value of the information gain. <br /><br />
The visualization on the right allows to try different split values for
the <span class="bold">Diameter</span> feature.
<span class="bold">Move the decision boundary yourself</span> to see how
the data points in the top chart are assigned to the left or right
children nodes accordingly. On the bottom you can see the corresponding
entropy values of both children nodes as well as the total information
gain. <br /><br />
The ID3 algorithm will select the split point with the largest
information gain, shown as the peak of the black line in the bottom
chart of <span class="bold">0.574</span> at
<span class="bold">Diameter = 0.45</span>.
</p>
<h3 class="center-text subheader">A Note On Information Measures</h3>
<p class="center-text">
An alternative to the entropy for the construction of Decision Trees is
the
<a
href="https://en.wikipedia.org/wiki/Decision_tree_learning#Gini_impurity"
>Gini impurity</a
>. This quantity is also a measure of information and can be seen as a
variation of Shannon's entropy. Decision trees trained using entropy or
Gini impurity are comparable, and only in a few cases do results differ
considerably. In the case of imbalanced data sets, entropy might be more
prudent. Yet Gini might train faster as it does not make use of
logarithms.
</p>
</section>
<section id="anotherlook">
<h3 class="center-text subheader">Another Look At Our Decision Tree</h3>
<p class="center-text">
Let's recap what we've learned so far. First, we saw how a Decision Tree
classifies data by repeatedly partitioning the feature space into
regions according to some conditional series of rules. Second, we
learned about entropy, a popular metric used to measure the purity (or
lack thereof) of a given sample of data. Third, we learned how Decision
Trees use entropy in information gain and the ID3 algorithm to determine
the exact conditional series of rules to select. Taken together, the
three sections detail the typical Decision Tree algorithm.
<br /><br />
To reinforce concepts, let's look at our Decision Tree from a slightly
different perspective.
<br /><br />
The tree below maps exactly to the tree we showed in
<span class="bold">How to Build a Decision Tree</span> section above.
However, instead of showing the partitioned feature space alongside our
trees structure, let's look at the partitioned data points and their
corresponding entropy at each node itself:
</p>
<div id="entropy-chart-tree"></div>
<br />
<p class="center-text">
From the top down, our sample of data points to classify shrinks as it
gets partitioned to different <span class="bold">decision</span> and
<span class="bold">leaf</span> nodes. In this manner, we could trace the
full path taken by a training data point if we so desired. Note also
that not every leaf node is pure: as discussed previously (and in the
next section), we don't want the structure of our Decision Trees to be
too deep, as such a model likely won't generalize well to unseen data.
</p>
</section>
<section id="pertubations">
<h3 class="center-text subheader">The Problem of Pertubations</h3>
<p class="center-text">
Without question, Decision Trees have a lot of things going for them.
They're simple models that are easy to interpret. They're fast to train
and require minimal data preprocessing. And they hand outliers with
ease. Yet they suffer from a major limitation, and that is their
instability compared with other predictors. They can be
<span class="bold"
>extremely sensitive to small perturbations in the data</span
>: a minor change in the training examples can result in a drastic
change in the structure of the Decision Tree.
</p>
<p class="center-text">
Check for yourself how small random Gaussian perturbations on just 5% of
the training examples create a set of completely different Decision
Trees:
</p>
<br /><br />
<div id="pertubation-wrapper">
<div id="tree-0" class="pertubation-item"></div>
<div id="scatter-0" class="pertubation-item"></div>
<div id="tree-1" class="pertubation-item"></div>
<div id="scatter-1" class="pertubation-item"></div>
<div id="tree-2" class="pertubation-item"></div>
<div id="scatter-2" class="pertubation-item"></div>
<div id="tree-3" class="pertubation-item"></div>
<div id="scatter-3" class="pertubation-item"></div>
<div id="tree-4" class="pertubation-item"></div>
<div id="scatter-4" class="pertubation-item"></div>
<div id="tree-5" class="pertubation-item"></div>
<div id="scatter-5" class="pertubation-item"></div>
<div id="tree-6" class="pertubation-item"></div>
<div id="scatter-6" class="pertubation-item"></div>
<div id="tree-7" class="pertubation-item"></div>
<div id="scatter-7" class="pertubation-item"></div>
</div>
<h3 class="center-text subheader">Why Is This A Problem?</h3>
<p class="center-text">
In their vanilla form, Decision Trees are unstable.
<br /><br />
If left unchecked, the ID3 algorithm to train Decision Trees will work
endlessly to minimize entropy. It will continue splitting the data until
all leaf nodes are completely pure - that is, consisting of only one
class. Such a process may yield very deep and complex Decision Trees. In
addition, we just saw that Decision Trees are subject to high variance
when exposed to small perturbations of the training data.
<br /><br />Both issues are undesirable, as they lead to predictors that
fail to clearly distinguish between persistent and random patterns in
the data, a problem known as <span class="bold">overfitting</span>. This
is problematic because it means that our model won't perform well when
exposed to new data.
</p>
<p class="center-text">
There are ways to prevent excessive growth of Decision Trees by pruning
them, for instance constraining their maximum depth, limiting the number
of leaves that can be created, or setting a minimum size for the amount
of items in each leaf and not allowing leaves with too few items in
them.
<br /><br />As for the issue of high variance? Well, unfortunately it's
an intrinsic characteristic when training a single Decision Tree.
</p>
</section>
<section id="limitations">
<h3 class="center-text subheader">
The Need to Go Beyond Decision Trees
</h3>
<p class="center-text">
Perhaps ironically, one way to alleviate the instability induced by
perturbations is to introduce an extra layer of randomness in the
training process. In practice this can be achieved by creating
<span class="bold">collections of Decision Trees</span> trained on
slightly different versions of the data set, the combined predictions of
which do not suffer so heavily from high variance. This approach opens
the door to one of the most successful Machine Learning algorithms thus
far: random forests.<br /><br />
Stay tuned for our future article!
</p>
</section>
<section id="final">
<h3 class="center-text subheader">The End</h3>
<p class="center-text">
Thanks for reading! We hope that the article is insightful no matter
where you are along your Machine Learning journey, and that you came
away with a better understanding of the Decision Tree algorithm.
<br /><br />
To make things compact, we skipped over some relevant topics, such as
using Decision Trees for regression, end-cut preference in tree models,
and other tree-specific hyperparameters. Check out the resources listed
below to learn more about those topics.
<br /><br />
To learn more about Machine Learning, check out our
<a class="on-end" href="https://aws.amazon.com/machine-learning/mlu/"
>self-paced courses</a
>, our
<a
class="on-end"
href="https://www.youtube.com/channel/UC12LqyqTQYbXatYS9AA7Nuw"
>YouTube videos</a
>, and the
<a class="on-end" href="https://d2l.ai/">Dive into Deep Learning</a>
textbook. If you have any comments or ideas related to
<a class="on-end" href="https://mlu-explain.github.io/"
>MLU-Explain articles</a
>, feel free to reach out directly to
<a class="on-end" href="https://twitter.com/jdwlbr">Jared </a> or
<a class="on-end" href="https://twitter.com/lusantala">Lucía</a>. The
code for this article is available
<a class="on-end" href="https://github.com/aws-samples/aws-mlu-explain"
>here</a
>.
</p>
<p class="center-text">
A special thanks goes out to <span class="bold">Brent Werness</span> for
valuable contributions to this article.
</p>
<br /><br />
<hr />
<br /><br />
<h3 class="center-text subheader reference">References + Open Source</h3>
<p class="center-text">
This article is a product of the following resources + the awesome
people who made (& contributed to) them:
</p>
<p class="resource-item">
<a
class="on-end"
href="https://onlinelibrary.wiley.com/doi/10.1002/j.1538-7305.1948.tb01338.x"
>A Mathematical Theory Of Communication</a
><br />
(Claude E. Shannon, 1948).
</p>
<p class="resource-item">
<a
class="on-end"
href="https://link.springer.com/article/10.1007/BF00116251"
>Induction of decision trees</a
><br />
(John Ross Quinlan, 1986).
</p>
<p class="resource-item">
<a
class="on-end"
href="https://www.dcc.fc.up.pt/~ltorgo/Papers/SECRTs.pdf"
>A Study on End-Cut Preference in Least Squares Regression Trees</a
><br />
(Luis Torgo, 2001).
</p>
<p class="resource-item">
<a
class="on-end"
href="https://link.springer.com/article/10.1007%2Fs10888-011-9188-x"
>The Origins Of The Gini Index: Extracts From Variabilità e Mutabilità
(Corrado Gini, 1912)</a
><br />
(Lidia Ceriani & Paolo Verne, 2012).
</p>
<p class="resource-item">
<a class="on-end" href="https://d3js.org/">D3.js</a><br />(Mike Bostock
& Philippe Rivière)
</p>
<p class="resource-item">
<a class="on-end" href="https://github.com/susielu/d3-annotation"
>d3-annotation</a
><br />(Susie Lu)
</p>
<p class="resource-item">
<a class="on-end" href="https://katex.org/">KaTeX</a> <br />(Emily
Eisenberg & Sophie Alpert)
</p>
<br />
<br />
</section>
<script src="./js/index.js"></script>
<script src="./js/annotatedTree.js"></script>
<script src="./js/katexCalls.js"></script>
</body>
</html>