-
Notifications
You must be signed in to change notification settings - Fork 0
/
index.html
executable file
·209 lines (193 loc) · 19.8 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
---
layout: default
---
<div class="header-container jumbotron">
<div class="container">
<h1>Toxic code snippets on Stack Overflow</h1>
<h3>Ragkhitwetsagul C., Krinke J., Paixao M., Bianco G., Oliveto R.</h3>
<p>Our empirical study of online code clones between 72,365 Java code snippets
on Stack Overflow <i class="fa fa-stack-overflow" aria-hidden="true"></i> and 111 Java open source projects reveals toxic code snippets:
100 outdated and 214 potentially license-violating clone pairs.</p>
<p><a class="btn btn-primary btn-lg" href="{{ "/docs/home/" | prepend: site.baseurl }}" role="button">See methodology and findings</a></p>
</div>
</div>
<div class="container">
<div class="row">
<div class="col-md-6">
<h2 class="header-light regular-pad">Online Code Clones</h2>
<blockquote>
<p>
We call code snippets that are copied from software systems to online Q&A
websites (such as Stack Overflow) and vice versa as <b>online code clones</b>.
There are two directions in creating online code clones: (1) code is cloned from
a software project to a Q&A website as an example; or (2) code is cloned from a
Q&A website to a software project to obtain a functionality, perform a
particular task, or fixing a bug.
</p>
</blockquote>
</div>
<div class="col-md-6 text-center">
<img src="img/online_clones.jpg" alt="" class="img-responsive">
</div>
</div>
<hr>
<div class="row">
<div class="col-md-12">
<h2 class="header-light regular-pad">Toxic Code Snippets</h2>
<blockquote>
<p>Toxic code snippets mean code snippets that are harmful for reuse and, in
several cases, are caused by online code cloning.
We found that Stack Overflow code snippets originated from open source software or online sources can become toxic when they
are (1) outdated or (2) violating their original software
license.</p>
</blockquote>
</div>
</div>
<div class="row">
<div class="col-sm-6">
<h1 class="text-center"><i class="fa fa-calendar-times-o" aria-hidden="true"></i></h1>
<h3 class="text-center">Outdated code</h3>
<p>
Outdated code occurs when a piece of code
has been copied from its origin to another location and later the original has
been updated (Xia et al., 2014). Usually code clone detection is used to locate
clone instances and update them to match with the originals.
However, online code clones are more difficult to detect than in regular
software projects due to its large search space and a mix of natural and
programming languages combined in the same post.
</p>
</div>
<div class="col-sm-6">
<h1 class="text-center"><i class="fa fa-balance-scale" aria-hidden="true"></i></h1>
<h3 class="text-center">Licensing violation</h3>
<p>
Code cloning can also have side effects of software
license compatability. Carelessly cloning
code from one project to another project with a different license may
cause a software licensing violation (German et al., 2009). This also
happens within the context of online Q&A websites such as Stack
Overflow.
</p>
</div>
</div>
<hr>
<div class="row">
<h2 class="header-light regular-pad">Examples of Toxic Code Snippets</h2>
<h3>1. The Hadoop's <code class="highlighter-rouge">compare</code> method</h3>
<p>
The first example is outdated and license-violating online code clones in an
answer to a <a href="https://stackoverflow.com/questions/22262310/hadoop-map-output-key-doesnt-implement-writablecomparable-implementing-rawcom/22315734#22315734">Stack Overflow question</a> regarding how to implement
<code class="highlighter-rouge">RawComparator</code> in <a href="http://hadoop.apache.org/">Hadoop</a>.
The figure below shows, on the left, a code snippet embedded as a part
of the accepted answer. The snippet shows how <a href="http://hadoop.apache.org/">Hadoop</a> implements the
<code class="highlighter-rouge">compare</code> method in its <code class="highlighter-rouge">WritableComparator</code>
class. The code snippet on the right shows another version of the same method,
but at this time extracted from the latest version (as of October 3, 2017) of
<a href="http://hadoop.apache.org/">Hadoop</a>.
<br /><br />
We can see that they both are highly similar except a line
containing <code class="highlighter-rouge">buffer.reset(null,0,0);</code> which was added on November
21, 2014. The added line is intended for cleaning up the reference in the
<code class="highlighter-rouge">buffer</code> variable and avoid excess heap usage
(issue no. <a href="https://issues.apache.org/jira/browse/HADOOP-11323">HADOOP-11323</a>).
</p>
<div class="col-md-6 text-left">
<!-- HTML generated using hilite.me -->
<div class="code"><pre style="line-height: 125%"><span style="color: #888888">/* Code in Stack Overflow post ID 22315734 (no license) */</span>
<span style="color: #008800; font-weight: bold">public</span> <span style="color: #333399; font-weight: bold">int</span> <span style="color: #0066BB; font-weight: bold">compare</span><span style="color: #333333">(</span><span style="color: #333399; font-weight: bold">byte</span><span style="color: #333333">[]</span> b1<span style="color: #333333">,</span><span style="color: #333399; font-weight: bold">int</span> s1<span style="color: #333333">,</span><span style="color: #333399; font-weight: bold">int</span> l1<span style="color: #333333">,</span><span style="color: #333399; font-weight: bold">byte</span><span style="color: #333333">[]</span> b2<span style="color: #333333">,</span><span style="color: #333399; font-weight: bold">int</span> s2<span style="color: #333333">,</span><span style="color: #333399; font-weight: bold">int</span> l2<span style="color: #333333">)</span> <span style="color: #333333">{</span>
<span style="color: #008800; font-weight: bold">try</span> <span style="color: #333333">{</span>
buffer<span style="color: #333333">.</span><span style="color: #0000CC">reset</span><span style="color: #333333">(</span>b1<span style="color: #333333">,</span> s1<span style="color: #333333">,</span> l1<span style="color: #333333">);</span> <span style="color: #888888">// parse key1</span>
key1<span style="color: #333333">.</span><span style="color: #0000CC">readFields</span><span style="color: #333333">(</span>buffer<span style="color: #333333">);</span>
buffer<span style="color: #333333">.</span><span style="color: #0000CC">reset</span><span style="color: #333333">(</span>b2<span style="color: #333333">,</span> s2<span style="color: #333333">,</span> l2<span style="color: #333333">);</span> <span style="color: #888888">// parse key2</span>
key2<span style="color: #333333">.</span><span style="color: #0000CC">readFields</span><span style="color: #333333">(</span>buffer<span style="color: #333333">);</span>
<span style="color: #333333">}</span> <span style="color: #008800; font-weight: bold">catch</span> <span style="color: #333333">(</span>IOException e<span style="color: #333333">)</span> <span style="color: #333333">{</span>
<span style="color: #008800; font-weight: bold">throw</span> <span style="color: #008800; font-weight: bold">new</span> <span style="color: #0066BB; font-weight: bold">RuntimeException</span><span style="color: #333333">(</span>e<span style="color: #333333">);</span>
<span style="color: #333333">}</span>
<span style="color: #008800; font-weight: bold">return</span> <span style="color: #0066BB; font-weight: bold">compare</span><span style="color: #333333">(</span>key1<span style="color: #333333">,</span> key2<span style="color: #333333">);</span> <span style="color: #888888">// compare them</span>
<span style="color: #333333">}</span>
</pre></div>
</div>
<div class="col-md-6 text-left">
<!-- HTML generated using hilite.me -->
<div class="code">
<pre style="line-height: 125%">
<span style="color: #888888">/* WritableComparator.java (2014-11-21) (Apache v.2.0 license) */</span>
<span style="color: #008800; font-weight: bold">public</span> <span style="color: #333399; font-weight: bold">int</span> <span style="color: #0066BB; font-weight: bold">compare</span><span style="color: #333333">(</span><span style="color: #333399; font-weight: bold">byte</span><span style="color: #333333">[]</span> b1<span style="color: #333333">,</span><span style="color: #333399; font-weight: bold">int</span> s1<span style="color: #333333">,</span><span style="color: #333399; font-weight: bold">int</span> l1<span style="color: #333333">,</span><span style="color: #333399; font-weight: bold">byte</span><span style="color: #333333">[]</span> b2<span style="color: #333333">,</span><span style="color: #333399; font-weight: bold">int</span> s2<span style="color: #333333">,</span><span style="color: #333399; font-weight: bold">int</span> l2<span style="color: #333333">)</span> <span style="color: #333333">{</span>
<span style="color: #008800; font-weight: bold">try</span> <span style="color: #333333">{</span>
buffer<span style="color: #333333">.</span><span style="color: #0000CC">reset</span><span style="color: #333333">(</span>b1<span style="color: #333333">,</span> s1<span style="color: #333333">,</span> l1<span style="color: #333333">);</span> <span style="color: #888888">// parse key1</span>
key1<span style="color: #333333">.</span><span style="color: #0000CC">readFields</span><span style="color: #333333">(</span>buffer<span style="color: #333333">);</span>
buffer<span style="color: #333333">.</span><span style="color: #0000CC">reset</span><span style="color: #333333">(</span>b2<span style="color: #333333">,</span> s2<span style="color: #333333">,</span> l2<span style="color: #333333">);</span> <span style="color: #888888">// parse key2</span>
key2<span style="color: #333333">.</span><span style="color: #0000CC">readFields</span><span style="color: #333333">(</span>buffer<span style="color: #333333">);</span>
buffer<span style="color: #333333">.</span><span style="color: #0000CC">reset</span><span style="color: #333333">(</span><span style="color: #008800; font-weight: bold">null</span><span style="color: #333333">,</span> <span style="color: #0000DD; font-weight: bold">0</span><span style="color: #333333">,</span> <span style="color: #0000DD; font-weight: bold">0</span><span style="color: #333333">);</span> <span style="color: #888888">// clean up reference</span>
<span style="color: #333333">}</span> <span style="color: #008800; font-weight: bold">catch</span> <span style="color: #333333">(</span>IOException e<span style="color: #333333">)</span> <span style="color: #333333">{</span>
<span style="color: #008800; font-weight: bold">throw</span> <span style="color: #008800; font-weight: bold">new</span> <span style="color: #0066BB; font-weight: bold">RuntimeException</span><span style="color: #333333">(</span>e<span style="color: #333333">);</span>
<span style="color: #333333">}</span>
<span style="color: #008800; font-weight: bold">return</span> <span style="color: #0066BB; font-weight: bold">compare</span><span style="color: #333333">(</span>key1<span style="color: #333333">,</span> key2<span style="color: #333333">);</span> <span style="color: #888888">// compare them</span>
<span style="color: #333333">}</span>
</pre></div>
</div>
<p>
While this change has already been introduced into the
<code class="highlighter-rouge">compare</code> method several years ago, the code example in Stack
Overflow post is still unchanged.
In addition, the original code snippet of
<code class="highlighter-rouge">WritableComparator</code> class in Hadoop is distributed with <a href="https://www.apache.org/licenses/LICENSE-2.0">Apache license
version 2.0</a> while its cloned instance on Stack Overflow contains only the
<code class="highlighter-rouge">compare</code> method and ignores its license statement on top of the
file.<br /><br />There are two potential issues for this. First, the code snippet may appear to be
under Stack Overflow's <a hrer="https://creativecommons.org/licenses/by-sa/3.0/">CC BY-SA 3.0</a> instead of its original Apache
license. Second, if the code snippet is copied and incorporated into another
software project with a conflicting license, a legal issue may arise.
</p>
<h3>2. The Hadoop's <code class="highlighter-rouge">humanReadableInt</code> method</h3>
<p>
The second motivating example of a toxic code snippet with more
disrupting changes than the first one can be found in an answer to a <a href="https://stackoverflow.com/questions/801987">Stack
Overflow question</a> regarding how to format files sizes in a human readable form.
The figure below shows, on the left, a code snippet to perform the
task from the <code class="highlighter-rouge">StringUtils</code> class in <a href="http://hadoop.apache.org/">Hadoop</a>.
<br /><br />
The code snippet on the
right shows another version of the same method, but at this time extracted from
the latest version of <a href="http://hadoop.apache.org/">Hadoop</a>. We can see that
they are totally different. The <code class="highlighter-rouge">humanReadableInt</code> method is
rewritten on February 5, 2013 to solve an issue of a race condition
(issue no. <a href="https://issues.apache.org/jira/browse/HADOOP-9252">HADOOP-9252</a>).
Similar to the first example, the clone code snippet on Stack Overflow does not
include its original <a href="https://www.apache.org/licenses/LICENSE-2.0">Apache v.2.0 license</a>.
</p>
<div class="col-md-6 text-left">
<!-- HTML generated using hilite.me -->
<div class="code"><pre style="line-height: 125%"><span style="color: #888888">/* Code in Stack Overflow post ID 801987 (no license) */</span>
<span style="color: #008800; font-weight: bold">public</span> <span style="color: #008800; font-weight: bold">static</span> String <span style="color: #0066BB; font-weight: bold">humanReadableInt</span><span style="color: #333333">(</span><span style="color: #333399; font-weight: bold">long</span> number<span style="color: #333333">)</span> <span style="color: #333333">{</span>
<span style="color: #333399; font-weight: bold">long</span> absNumber <span style="color: #333333">=</span> Math<span style="color: #333333">.</span><span style="color: #0000CC">abs</span><span style="color: #333333">(</span>number<span style="color: #333333">);</span>
<span style="color: #333399; font-weight: bold">double</span> result <span style="color: #333333">=</span> number<span style="color: #333333">;</span>
String suffix <span style="color: #333333">=</span> <span style="background-color: #fff0f0">""</span><span style="color: #333333">;</span>
<span style="color: #008800; font-weight: bold">if</span> <span style="color: #333333">(</span>absNumber <span style="color: #333333"><</span> <span style="color: #0000DD; font-weight: bold">1024</span><span style="color: #333333">)</span> <span style="color: #333333">{</span>
<span style="color: #888888">// nothing</span>
<span style="color: #333333">}</span> <span style="color: #008800; font-weight: bold">else</span> <span style="color: #008800; font-weight: bold">if</span> <span style="color: #333333">(</span>absNumber <span style="color: #333333"><</span> <span style="color: #0000DD; font-weight: bold">1024</span> <span style="color: #333333">*</span> <span style="color: #0000DD; font-weight: bold">1024</span><span style="color: #333333">)</span> <span style="color: #333333">{</span>
result <span style="color: #333333">=</span> number <span style="color: #333333">/</span> <span style="color: #6600EE; font-weight: bold">1024.0</span><span style="color: #333333">;</span>
suffix <span style="color: #333333">=</span> <span style="background-color: #fff0f0">"k"</span><span style="color: #333333">;</span>
<span style="color: #333333">}</span> <span style="color: #008800; font-weight: bold">else</span> <span style="color: #008800; font-weight: bold">if</span> <span style="color: #333333">(</span>absNumber <span style="color: #333333"><</span> <span style="color: #0000DD; font-weight: bold">1024</span> <span style="color: #333333">*</span> <span style="color: #0000DD; font-weight: bold">1024</span> <span style="color: #333333">*</span> <span style="color: #0000DD; font-weight: bold">1024</span><span style="color: #333333">)</span> <span style="color: #333333">{</span>
result <span style="color: #333333">=</span> number <span style="color: #333333">/</span> <span style="color: #333333">(</span><span style="color: #6600EE; font-weight: bold">1024.0</span> <span style="color: #333333">*</span> <span style="color: #0000DD; font-weight: bold">1024</span><span style="color: #333333">);</span>
suffix <span style="color: #333333">=</span> <span style="background-color: #fff0f0">"m"</span><span style="color: #333333">;</span>
<span style="color: #333333">}</span> <span style="color: #008800; font-weight: bold">else</span> <span style="color: #333333">{</span>
result <span style="color: #333333">=</span> number <span style="color: #333333">/</span> <span style="color: #333333">(</span><span style="color: #6600EE; font-weight: bold">1024.0</span> <span style="color: #333333">*</span> <span style="color: #0000DD; font-weight: bold">1024</span> <span style="color: #333333">*</span> <span style="color: #0000DD; font-weight: bold">1024</span><span style="color: #333333">);</span>
suffix <span style="color: #333333">=</span> <span style="background-color: #fff0f0">"g"</span><span style="color: #333333">;</span>
<span style="color: #333333">}</span>
<span style="color: #008800; font-weight: bold">return</span> oneDecimal<span style="color: #333333">.</span><span style="color: #0000CC">format</span><span style="color: #333333">(</span>result<span style="color: #333333">)</span> <span style="color: #333333">+</span> suffix<span style="color: #333333">;</span>
<span style="color: #333333">}</span>
</pre></div>
</div>
<div class="col-md-6 text-left">
<!-- HTML generated using hilite.me -->
<div class="code">
<pre style="margin: 0; line-height: 125%"><span style="color: #888888">/* StringUtils.java (2013-02-05) (Apache v.2.0 license) */</span>
<span style="color: #008800; font-weight: bold">public</span> <span style="color: #008800; font-weight: bold">static</span> String <span style="color: #0066BB; font-weight: bold">humanReadableInt</span><span style="color: #333333">(</span><span style="color: #333399; font-weight: bold">long</span> number<span style="color: #333333">)</span> <span style="color: #333333">{</span>
<span style="color: #008800; font-weight: bold">return</span> TraditionalBinaryPrefix<span style="color: #333333">.</span><span style="color: #0000CC">long2String</span><span style="color: #333333">(</span>number<span style="color: #333333">,</span><span style="background-color: #fff0f0">""</span><span style="color: #333333">,</span><span style="color: #0000DD; font-weight: bold">1</span><span style="color: #333333">);</span>
<span style="color: #333333">}</span>
</pre></div>
</div>
</div>
</div>
</div>