-
Notifications
You must be signed in to change notification settings - Fork 0
/
Graph500.html
1829 lines (1691 loc) · 63.3 KB
/
Graph500.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" lang="en" xml:lang="en">
<head>
<title>Graph 500 Benchmark 1 ("Search")</title>
<!-- 2013-11-19 Tue 10:25 -->
<meta http-equiv="Content-Type" content="text/html;charset=utf-8" />
<meta name="generator" content="Org-mode" />
<meta name="author" content="Graph 500 Steering Committee" />
<style type="text/css">
<!--/*--><![CDATA[/*><!--*/
.title { text-align: center; }
.todo { font-family: monospace; color: red; }
.done { color: green; }
.tag { background-color: #eee; font-family: monospace;
padding: 2px; font-size: 80%; font-weight: normal; }
.timestamp { color: #bebebe; }
.timestamp-kwd { color: #5f9ea0; }
.right { margin-left: auto; margin-right: 0px; text-align: right; }
.left { margin-left: 0px; margin-right: auto; text-align: left; }
.center { margin-left: auto; margin-right: auto; text-align: center; }
.underline { text-decoration: underline; }
#postamble p, #preamble p { font-size: 90%; margin: .2em; }
p.verse { margin-left: 3%; }
pre {
border: 1px solid #ccc;
box-shadow: 3px 3px 3px #eee;
padding: 8pt;
font-family: monospace;
overflow: auto;
margin: 1.2em;
}
pre.src {
position: relative;
overflow: visible;
padding-top: 1.2em;
}
pre.src:before {
display: none;
position: absolute;
background-color: white;
top: -10px;
right: 10px;
padding: 3px;
border: 1px solid black;
}
pre.src:hover:before { display: inline;}
pre.src-sh:before { content: 'sh'; }
pre.src-bash:before { content: 'sh'; }
pre.src-emacs-lisp:before { content: 'Emacs Lisp'; }
pre.src-R:before { content: 'R'; }
pre.src-perl:before { content: 'Perl'; }
pre.src-java:before { content: 'Java'; }
pre.src-sql:before { content: 'SQL'; }
table { border-collapse:collapse; }
td, th { vertical-align:top; }
th.right { text-align: center; }
th.left { text-align: center; }
th.center { text-align: center; }
td.right { text-align: right; }
td.left { text-align: left; }
td.center { text-align: center; }
dt { font-weight: bold; }
.footpara:nth-child(2) { display: inline; }
.footpara { display: block; }
.footdef { margin-bottom: 1em; }
.figure { padding: 1em; }
.figure p { text-align: center; }
.inlinetask {
padding: 10px;
border: 2px solid gray;
margin: 10px;
background: #ffffcc;
}
#org-div-home-and-up
{ text-align: right; font-size: 70%; white-space: nowrap; }
textarea { overflow-x: auto; }
.linenr { font-size: smaller }
.code-highlighted { background-color: #ffff00; }
.org-info-js_info-navigation { border-style: none; }
#org-info-js_console-label
{ font-size: 10px; font-weight: bold; white-space: nowrap; }
.org-info-js_search-highlight
{ background-color: #ffff00; color: #000000; font-weight: bold; }
/*]]>*/-->
</style>
<style>body {margin: 0 auto; max-width: 40em;} table {margin-left:auto; margin-right:auto;}</style>
<style>div.openissue {margin-left: 10%; margin-right: 10%; color: red;}</style>
<script type="text/javascript">
/*
@licstart The following is the entire license notice for the
JavaScript code in this tag.
Copyright (C) 2012 Free Software Foundation, Inc.
The JavaScript code in this tag is free software: you can
redistribute it and/or modify it under the terms of the GNU
General Public License (GNU GPL) as published by the Free Software
Foundation, either version 3 of the License, or (at your option)
any later version. The code is distributed WITHOUT ANY WARRANTY;
without even the implied warranty of MERCHANTABILITY or FITNESS
FOR A PARTICULAR PURPOSE. See the GNU GPL for more details.
As additional permission under GNU GPL version 3 section 7, you
may distribute non-source (e.g., minimized or compacted) forms of
that code without the copy of the GNU GPL normally required by
section 4, provided you include this license notice and a URL
through which recipients can access the Corresponding Source.
@licend The above is the entire license notice
for the JavaScript code in this tag.
*/
<!--/*--><![CDATA[/*><!--*/
function CodeHighlightOn(elem, id)
{
var target = document.getElementById(id);
if(null != target) {
elem.cacheClassElem = elem.className;
elem.cacheClassTarget = target.className;
target.className = "code-highlighted";
elem.className = "code-highlighted";
}
}
function CodeHighlightOff(elem, id)
{
var target = document.getElementById(id);
if(elem.cacheClassElem)
elem.className = elem.cacheClassElem;
if(elem.cacheClassTarget)
target.className = elem.cacheClassTarget;
}
/*]]>*///-->
</script>
<script type="text/javascript" src="http://orgmode.org/mathjax/MathJax.js"></script>
<script type="text/javascript">
<!--/*--><![CDATA[/*><!--*/
MathJax.Hub.Config({
// Only one of the two following lines, depending on user settings
// First allows browser-native MathML display, second forces HTML/CSS
// config: ["MMLorHTML.js"], jax: ["input/TeX"],
jax: ["input/TeX", "output/HTML-CSS"],
extensions: ["tex2jax.js","TeX/AMSmath.js","TeX/AMSsymbols.js",
"TeX/noUndefined.js"],
tex2jax: {
inlineMath: [ ["\\(","\\)"] ],
displayMath: [ ['$$','$$'], ["\\[","\\]"], ["\\begin{displaymath}","\\end{displaymath}"] ],
skipTags: ["script","noscript","style","textarea","pre","code"],
ignoreClass: "tex2jax_ignore",
processEscapes: false,
processEnvironments: true,
preview: "TeX"
},
showProcessingMessages: true,
displayAlign: "center",
displayIndent: "2em",
"HTML-CSS": {
scale: 100,
availableFonts: ["STIX","TeX"],
preferredFont: "TeX",
webFont: "TeX",
imageFont: "TeX",
showMathMenu: true,
},
MMLorHTML: {
prefer: {
MSIE: "MML",
Firefox: "MML",
Opera: "HTML",
other: "HTML"
}
}
});
/*]]>*///-->
</script>
</head>
<body>
<div id="content">
<h1 class="title">Graph 500 Benchmark 1 ("Search")</h1>
<div id="table-of-contents">
<h2>Table of Contents</h2>
<div id="text-table-of-contents">
<ul>
<li><a href="#sec-1">1. Introduction</a>
<ul>
<li><a href="#sec-1-1">1.1. The role of the reference implementation</a></li>
<li><a href="#sec-1-2">1.2. Significant changes in V2.0</a></li>
<li><a href="#sec-1-3">1.3. References</a></li>
</ul>
</li>
<li><a href="#sec-2">2. Overall Benchmark Structure</a></li>
<li><a href="#sec-3">3. Parameter Summary</a></li>
<li><a href="#prng">4. Pseudo-Random Number Generation</a>
<ul>
<li><a href="#sec-4-1">4.1. References</a></li>
</ul>
</li>
<li><a href="#sec-5">5. Edge List Generation</a>
<ul>
<li><a href="#sec-5-1">5.1. Mapping the Edge List onto Distinguished Memory Spaces</a></li>
<li><a href="#sec-5-2">5.2. Permuting Edge List Indices</a></li>
<li><a href="#sec-5-3">5.3. Edge List Entries</a>
<ul>
<li><a href="#sec-5-3-1">5.3.1. Tree Edges</a></li>
<li><a href="#sec-5-3-2">5.3.2. RMAT Edges</a></li>
</ul>
</li>
<li><a href="#sec-5-4">5.4. Scrambling Vertex Numbers</a></li>
<li><a href="#sec-5-5">5.5. References</a></li>
</ul>
</li>
<li><a href="#kernel1">6. Kernel 1 – Graph Construction</a>
<ul>
<li><a href="#sec-6-1">6.1. Description</a></li>
</ul>
</li>
<li><a href="#sampleroot">7. Sampling Initial Vertices</a></li>
<li><a href="#kernel2">8. Kernel 2 – Breadth-First Search</a>
<ul>
<li><a href="#sec-8-1">8.1. Description</a></li>
<li><a href="#sec-8-2">8.2. Kernel 2 Output</a></li>
</ul>
</li>
<li><a href="#kernel3">9. Kernel 3 – Single Source Shortest Paths</a>
<ul>
<li><a href="#sec-9-1">9.1. Description</a></li>
<li><a href="#sec-9-2">9.2. Kernel 3 Output</a></li>
<li><a href="#sec-9-3">9.3. References</a></li>
</ul>
</li>
<li><a href="#sec-10">10. Validation</a></li>
<li><a href="#sec-11">11. Computing and Presenting Performance Information</a>
<ul>
<li><a href="#sec-11-1">11.1. Timing</a></li>
<li><a href="#benchmarkoutput">11.2. Submission Format</a></li>
</ul>
</li>
<li><a href="#evaluation">12. Evaluation Criteria</a>
<ul>
<li><a href="#sec-12-1">12.1. Performance Metric (TEPS)</a></li>
</ul>
</li>
<li><a href="#sec-13">13. Sample Driver</a></li>
</ul>
</div>
</div>
<p>
Contributors: David A. Bader (Georgia Institute of Technology),
Jonathan Berry (Sandia National Laboratories), Simon Kahan (Pacific
Northwest National Laboratory and University of Washington), Richard
Murphy (Micron Technology), E. Jason Riedy (Georgia
Institute of Technology), and Jeremiah Willcock (Indiana University).
</p>
<p>
Version History:
</p>
<dl class="org-dl">
<dt> V0.1 </dt><dd>Draft, created 28 July 2010
</dd>
<dt> V0.2 </dt><dd>Draft, created 29 September 2010
</dd>
<dt> V0.3 </dt><dd>Draft, created 30 September 2010
</dd>
<dt> V1.0 </dt><dd>Created 1 October 2010
</dd>
<dt> V1.1 </dt><dd>Created 3 October 2010
</dd>
<dt> V3.0 </dt><dd>Created XXX 2013
</dd>
</dl>
<p>
Version 0.1 of this document was part of the Graph 500 community
benchmark effort, led by Richard Murphy (then at Sandia National
Laboratories). The intent is that there will be at least three
variants of implementations, on shared memory and threaded systems, on
distributed memory clusters, and on external memory map-reduce
clouds. This specification is for the first of potentially several
benchmark problems. The version number jumped to three to synchronize
with the reference code.
</p>
<div class="openissue">
<p>
One "open issue" remains: What can be precomputed? I have permitted
a constant number of scalars like Δ in Δ-stepping SSSP but
not growing information like the component hierarchy for Thorup's
algorithm. This will permit a handful of vertex degree thresholds.
</p>
</div>
<div id="outline-container-sec-1" class="outline-2">
<h2 id="sec-1"><span class="section-number-2">1</span> Introduction</h2>
<div class="outline-text-2" id="text-1">
<p>
Data-intensive supercomputer applications are an increasingly
important workload, but are ill-suited for platforms designed for 3D
physics simulations. Application performance cannot be improved
without a meaningful benchmark. Graphs are a core part of most
analytics workloads. Backed by a steering committee of 30
international HPC experts from academia, industry, and national
laboratories, this specification establishes a large-scale benchmark
for these applications. It will offer a forum for the community and
provide a rallying point for data-intensive supercomputing
problems. This is the first serious approach to augment the Top 500
with data-intensive applications.
</p>
<p>
The intent of this benchmark problem ("Search") is to develop a
compact application that has multiple analysis techniques (multiple
kernels) accessing a single data structure representing a weighted,
undirected graph. In addition to a kernel to construct the graph from
the input tuple list, there is one additional computational
kernel to operate on the graph.
</p>
<p>
This benchmark includes a scalable, reproducible data generator which
produces edge tuples containing the start vertex and end vertex for each
edge. The first kernel constructs an <i>undirected</i> graph in a format
usable by all subsequent kernels. No subsequent modifications are
permitted to benefit specific kernels. The second kernel performs
multiple breadth-first searches of the graph. The third kernel performs
multiple single-source shortest path computations on the graph. Each
run of the second and third kernel is independent of the others and uses
only the output of the initial construction from the first kernel.
</p>
<p>
All kernels are timed and reported. The ranking used for the official
Graph500 listing are provided in <a href="#evaluation">the section on Evaluation Criteria</a>.
The other data is useful both for explaining the results as well as
how graph searches behave on available platforms.
</p>
</div>
<div id="outline-container-sec-1-1" class="outline-3">
<h3 id="sec-1-1"><span class="section-number-3">1.1</span> The role of the reference implementation</h3>
<div class="outline-text-3" id="text-1-1">
<p>
This benchmark also specifies a reference implementation. This
implementation is not tuned for any particular system or hardware
platform. The reference implementation defines the edge list
generator. The generator can be used separately from the timed
kernels.
</p>
<p>
Submissions are required to include performance of the reference
implementation if the reference implementation runs on their platform.
The performance of the reference implementation gives some indication of
the performance of portable graph search code written by typical
programmers. Submissions are encouraged to include results from
platform-tuned and optimized codes along with the results from the most
applicable reference code. If no reference code applies to a particular
platform, the reference code performance results need not be included,
although the graph data must be generated correctly as with the
reference generator.
</p>
</div>
</div>
<div id="outline-container-sec-1-2" class="outline-3">
<h3 id="sec-1-2"><span class="section-number-3">1.2</span> Significant changes in V2.0</h3>
<div class="outline-text-3" id="text-1-2">
<ul class="org-ul">
<li>Generator:
<ul class="org-ul">
<li>Changed graph generator parameters.
</li>
<li>Begin with a tree to connect all vertices.
</li>
<li>Use a location-based hash for a PRNG. All implementations should
produce identical graphs. The edge list need not be generated
explicitly but may be computed on-the-fly.
</li>
<li>"Permute" edge list locations by index multiplication rather than
a full permutation. This scatters the tree edges around the edge
list without excess data motion.
</li>
</ul>
</li>
<li>All kernels:
<ul class="org-ul">
<li>Reduced number of search roots to eight from 64 because the graph
is fully connected.
</li>
<li>Both search kernels (2 and 3) use a single, unified, and
simplified validation routine.
</li>
</ul>
</li>
<li><a href="#kernel1">Kernel 1, graph construction</a>:
<ul class="org-ul">
<li>Removed restrictions on internal data structure.
</li>
<li>No longer computes the number of vertices.
</li>
</ul>
</li>
<li><a href="#kernel2">Kernel 2, BFS</a>:
<ul class="org-ul">
<li>No significant changes to the specification, but the reference
implementation should be faster.
</li>
</ul>
</li>
<li><a href="#kernel3">Kernel 3, single-source shortest paths</a>:
<ul class="org-ul">
<li><b>New kernel</b>.
</li>
</ul>
</li>
<li>Results:
<ul class="org-ul">
<li>New submission format. Submissions provide sizes and times but do
not need to compute their own statistics.
</li>
<li><b>Require</b> running the reference code if possible as in the Top500 list.
</li>
</ul>
</li>
</ul>
</div>
</div>
<div id="outline-container-sec-1-3" class="outline-3">
<h3 id="sec-1-3"><span class="section-number-3">1.3</span> References</h3>
<div class="outline-text-3" id="text-1-3">
<ul class="org-ul">
<li>D.A. Bader, J. Feo, J. Gilbert, J. Kepner, D. Koester, E. Loh,
K. Madduri, W. Mann, Theresa Meuse, <a href="http://graphanalysis.org/benchmark/index.html">HPCS Scalable Synthetic Compact
Applications #2 Graph Analysis (SSCA#2 v2.2 Specification)</a>, 5
September 2007.
</li>
<li>Richard C. Murphy, Kyle B. Wheeler, Brian W. Barrett, James A. Ang,
"Introducing the Graph 500," Cray User’s Group (CUG), May 5, 2010.
</li>
<li>Richard C. Murphy, Jonathan Berry, William McLendon, Bruce
Hendrickson, Douglas Gregor, Andrew Lumsdaine, "DFS: A Simple to
Write Yet Difficult to Execute Benchmark," IEEE International
Symposium on Workload Characterizations 2006 (IISWC06), San Jose,
CA, 25-27 October 2006.
</li>
</ul>
</div>
</div>
</div>
<div id="outline-container-sec-2" class="outline-2">
<h2 id="sec-2"><span class="section-number-2">2</span> Overall Benchmark Structure</h2>
<div class="outline-text-2" id="text-2">
<p>
The benchmark performs the following steps, where BFS refers to a
breadth-first search and SSSP refers to a single-source shortest path to
be described below:
</p>
<ol class="org-ol">
<li>Generate the random edge list.
</li>
<li>Randomly sample 8 unique initial vertices.
</li>
<li>Construct a graph from the edge list (<b>timed</b>, <a href="#kernel1">Kernel 1</a>).
</li>
<li>For each initial vertex:
<ol class="org-ol">
<li>Compute the BFS parent array (<b>timed</b>, <a href="#kernel2">Kernel 2</a>).
</li>
<li>Validate that the parent array is a correct BFS search tree
for the given search tree.
</li>
</ol>
</li>
<li>For each initial vertex:
<ol class="org-ol">
<li>Compute the SSSP parent array and distances (<b>timed</b>, <a href="#kernel3">Kernel 3</a>).
</li>
<li>Validate that the parent array and distance vector is a correct
SSSP search tree for the given search tree.
</li>
</ol>
</li>
<li>Compute and output performance information.
</li>
</ol>
<p>
Only the sections marked as <b>timed</b> are included in the performance
information. Note that the <a href="#kernel2">Kernel 2</a> and <i>Kernel 3</i> are run in separate
loops and not consecutively off the same initial vertex. All mentions
of "random" refer to the reproducible pseudo-random number generator
included in the <a href="http://www.graph500.org/reference.html">reference implementation</a>. This benchmark is an
artificial system measurement and not a direct representation of actual
applications. Therefore no extra information like optimal parameter
settings may be passed between kernel invocations, although <a href="#kernel1">Kernel 1</a> may
pre-compute reasonable data statistics and parameters used by <b>all</b>
later kernels without further changes.
</p>
</div>
</div>
<div id="outline-container-sec-3" class="outline-2">
<h2 id="sec-3"><span class="section-number-2">3</span> Parameter Summary</h2>
<div class="outline-text-2" id="text-3">
<p>
The benchmark takes only one parameter as input:
</p>
<dl class="org-dl">
<dt> SCALE </dt><dd>The SCALE parameter controls the overall size of the
graph. The generated graph contains 2<sup>SCALE</sup> vertices.
The number of entries in the generated edge list is
2<sup>SCALE</sup> * edgefactor, where edgefactor is an internal
parameter described below.
</dd>
</dl>
<p>
The benchmark also contains internal parameters with required settings
for submission. Experimenting with different setting is useful for
testing and exploration but not permitted for submitted results.
</p>
<dl class="org-dl">
<dt> edgefactor = 16 </dt><dd>The average number of entries in the generated
edge list containing each vertex.
</dd>
<dt> maxweight = 255 </dt><dd>The maximum edge weight in the generated edge
list. Because edges may appear multiple times, this is not the
maximum weight of the edge in the graph.
</dd>
<dt> A = 0.55, B = 0.1 </dt><dd>The parameters A and B control quadrant
probabilities in the RMAT edge generator subject to the
restrictions that 0 ≤ A ≤ 1, 0 ≤ B ≤ 1, and A+2B ≤ 1.
</dd>
<dt> noisefact = 0.1 </dt><dd>The RMAT generator perturbs A and B by a random
quantity weighted by noisefact.
</dd>
<dt> nroots = 16 </dt><dd>The number of search roots used for running Kernels
2 and 3.
</dd>
</dl>
<p>
The rest of the specification uses two parameters for the graph size
rather than repeating the expressions above.
</p>
<dl class="org-dl">
<dt> NV = 2<sup>SCALE</sup> </dt><dd>The number of vertices.
</dd>
<dt> NE = edgefactor * NV </dt><dd>The number of entries in the edge list.
</dd>
</dl>
</div>
</div>
<div id="outline-container-prng" class="outline-2">
<h2 id="prng"><a id="sec-4" name="sec-4"></a><span class="section-number-2">4</span> Pseudo-Random Number Generation</h2>
<div class="outline-text-2" id="text-prng">
<p>
The pseudo-random number generator (PRNG) used in this benchmark,
<code>threefry32x4_10</code> from the Random123 package referenced below,
essentially hashes a location-based counter into four random 32-bit
bitstrings. Each use of the PRNG will provide the mapping from the
use's location to two PRNG parameters. Given two 64-bit integers I and
J, PRNG(I, J) return four 32-bit floating-point numbers.
</p>
<p>
A location-based PRNG guarantees the numbers will be reproducible across
different platforms. We use floating-point numbers to spread bias
across the interval rather than defining how to iterate for rejection
sampling.
</p>
</div>
<div id="outline-container-sec-4-1" class="outline-3">
<h3 id="sec-4-1"><span class="section-number-3">4.1</span> References</h3>
<div class="outline-text-3" id="text-4-1">
<ul class="org-ul">
<li>John K. Salmon, Mark A. Moraes, Ron O. Dror, and David
E. Shaw. 2011. Parallel random numbers: as easy as 1, 2, 3. In
<i>Proceedings of 2011 International Conference for High Performance
Computing, Networking, Storage and Analysis (SC '11)</i>. ACM, New York,
NY, USA. <a href="http://dx.doi.org/10.1145/2063384.2063405">http://dx.doi.org/10.1145/2063384.2063405</a>
</li>
<li>Random123 software distribution:
<a href="http://www.deshawresearch.com/resources_random123.html">http://www.deshawresearch.com/resources_random123.html</a>
</li>
</ul>
</div>
</div>
</div>
<div id="outline-container-sec-5" class="outline-2">
<h2 id="sec-5"><span class="section-number-2">5</span> Edge List Generation</h2>
<div class="outline-text-2" id="text-5">
<p>
The benchmark defines a list of NE undirected edges that represent a
fully connected, undirected graph on NV vertices. The edges are
permuted in a pseudo-random fashion with a computable and invertable
permutation. The list locations will dictate where edge entries appear,
and the list indices (unpermuted locations) will determine the edge kind
and provide the input for the PRNG. Vertex numbers are scrambled to
eliminate generator locality. This list of edges may be generated
explicitly, read from storage, or may be generated on-the-fly within
<a href="#kernel1">Kernel 1</a>. All edge generation must occur before <a href="#kernel2">Kernel 2</a>.
</p>
<p>
Our goal is to provide a natural yet even starting line for all
implementations. If an implementation distinguishes between available
memory spaces, the edge list must suffer a balanced mapping onto those
memory spaces. The edge list also must be reproducible for the same
size across such different platforms yet be sufficiently permuted not
to allow "cheating" by knowledge of edge index.
</p>
<p>
If in doubt, use the reference implementation edge generation routines.
</p>
</div>
<div id="outline-container-sec-5-1" class="outline-3">
<h3 id="sec-5-1"><span class="section-number-3">5.1</span> Mapping the Edge List onto Distinguished Memory Spaces</h3>
<div class="outline-text-3" id="text-5-1">
<p>
The list of NE edges abstractly is a single array of edge entries. Each
edge entry consists of two vertices and a weight. Each vertex is an
integer at least zero and less than NV. The weight is an integer at
least zero and at most maxweight. The edge list must provide at least
48 bits per vertex and eight bits per weight. Each implementation maps
the array onto its representation of distinguished memory spaces in a
balanced manner.
</p>
<p>
An implementation using a single, undistinguished memory space
(e.g. OpenMP, Cilk, Cray XMT) maps the edge array into the global edge
list directly. Entry k of the global array is entry k of the edge
list. While many of these programming systems are implemented on top
of distinguished memory spaces (e.g. NUMA systems), the programming
system itself abstracts the mapping from memory spaces to appear
uniform. The programming system may optimize for the mapping, but the
benchmark code must not include those optimizations unless the code
does count the spaces as distinguished. Programming systems may not
special-case Graph500 code for submitted results.
</p>
<p>
An implementation with separate, distinguished memory spaces (e.g. MPI,
UPC, OpenCL with multiple devices) maps the local arrays into the global
edge list in a contiguous, balanced fashion. Assume all memory spaces
are equally sized and are enumerated with integers starting at zero.
Given NP total memory spaces, let
</p>
<ul class="org-ul">
<li>NE<sub>space</sub>(i) = floor(NE/i) + (i < NE%NP? 1 : 0) be the number of entries in
space i,
</li>
<li>NE<sub>begin</sub>(i) = floor(NE/i) + (i < NE%NP? i : NE%NP) be the first
index stored in space i, and
</li>
<li>NE<sub>end</sub>(i) = NE<sub>begin</sub>(i+1) be one past the last index stored in
space i.
</li>
</ul>
<p>
Then memory space i stores list locations starting with NE<sub>begin</sub>(i) up to
but not including NE<sub>end</sub>(i). This specification does not dictate the
mapping of memory spaces to distinguished memories.
</p>
<p>
This allocation guarantees that no memory space holds more than one
edge more or less than any other memory space. If the memory spaces
are not equally sized, the edge list must be allocated in each
proportional to the memory space's share of the total memory size, and
no memory space of the same size may hold more than one more or less
than one fewer edge than any other space of the same size.
</p>
</div>
</div>
<div id="outline-container-sec-5-2" class="outline-3">
<h3 id="sec-5-2"><span class="section-number-3">5.2</span> Permuting Edge List Indices</h3>
<div class="outline-text-3" id="text-5-2">
<p>
This benchmark permutes edge list indices from the list location k' to
an edge list index k based on the group structure of integers modulo NE.
This section is written more generally than required for result
submission.
</p>
<p>
Given an integer Z constant for a given SCALE and edgefactor that is
relatively prime to the number of edge list entries NE, let
</p>
<ul class="org-ul">
<li>k = Z * k' mod NE, and
</li>
<li>k' = Zinv * k mod NE.
</li>
</ul>
<p>
Here Zinv is the integer inverse of Z modulo NE.
</p>
<p>
For submitted results, Z is the first integer relatively prime to NE
such that Z > floor(3*NE/4). Z and Zinv can be computed in many ways.
One simple way as in Algorithm \ref{alg:compute.perm} below relies on the
Euclidean algorithm for computing the greatest common divisor of a
proposed Z and the given NE.
</p>
<div class="org-src-container">
<label class="org-src-name">Computing the edge list permutation.</label>
<pre class="src src-Octave" id="alg:compute.perm">function [Z, Zinv] = compute_perm (NE)
for Z = (1+floor(3*NE/4)):(NE-1),
[g, Zinv] = gcd (Z, NE);
if 1 == g,
assert (1 == mod (Z * Zinv, NE));
return;
endif
endfor
endfunction
</pre>
</div>
</div>
</div>
<div id="outline-container-sec-5-3" class="outline-3">
<h3 id="sec-5-3"><span class="section-number-3">5.3</span> Edge List Entries</h3>
<div class="outline-text-3" id="text-5-3">
<p>
The first NV-1 unpermuted list entries, those with 0 ≤ k < NV-1,
are <b>tree edges</b> that guarantee the graph is connected. The remaining
unpermuted edges are <b>RMAT edges</b>. The case when NV-1 > NE will
never occur in submitted results and is left unspecified.
</p>
<div class="org-src-container">
<label class="org-src-name">Generating a slice of the edge list.</label>
<pre class="src src-Octave" id="alg:edge.list">function ijw = edge_list (ne_begin, ne_len,
SCALE, NE, maxweight)
NV = 2**SCALE;
ijw = zeros (3, ne_len);
[Z, Zinv] = compute_perm (NE);
for t = 1:ne_len,
## kp = k', location in the global edge list.
kp = ne_begin + t - 1;
k = mod (Zinv * kp, NE);
## Generate four pseudo-random numbers, but use
## only one for the weight.
rnd = PRNG (k, 0);
w = ceil (rnd(1) * maxweight);
if k < NV,
[v1, v2] = tree_edge (k);
else
[v1, v2] = rmat_edge (k, SCALE);
endif
v1 = scramble (v1, SCALE);
v2 = scramble (v2, SCALE);
ijw(:, t) = [v1; v2; w]; # Location kp "globally."
endfor
ijw = ijw.'; # Switch to columns for i, j, w.
endfunction
</pre>
</div>
</div>
<div id="outline-container-sec-5-3-1" class="outline-4">
<h4 id="sec-5-3-1"><span class="section-number-4">5.3.1</span> Tree Edges</h4>
<div class="outline-text-4" id="text-5-3-1">
<p>
Given an unpermuted edge index k, the vertices for index k are
floor(k/2) and k+1. The weight is given by maxweight *
PRNG(NV,k). Algorithm \ref{alg:tree.edge} provides the high-level
implementation of the tree edge generator.
</p>
<div class="org-src-container">
<label class="org-src-name">Tree edge function.</label>
<pre class="src src-Octave" id="alg:tree.edge">function [v1, v2, w] = tree_edge (k)
v1 = floor (k/2);
v2 = k+1;
endfunction
</pre>
</div>
</div>
</div>
<div id="outline-container-sec-5-3-2" class="outline-4">
<h4 id="sec-5-3-2"><span class="section-number-4">5.3.2</span> RMAT Edges</h4>
<div class="outline-text-4" id="text-5-3-2">
<p>
The additional edges come from a RMAT edge generator similar to the
Recursive MATrix (R-MAT) scale-free graph generation algorithm
[Chakrabarti, et al., 2004]. For ease of discussion, the description of
this R-MAT generator uses an adjacency matrix data structure; however,
implementations may use any alternate approach that outputs the
equivalent list of edge tuples. This model recursively sub-divides the
adjacency matrix of the graph into four equal-sized partitions and
distributes edges within these partitions with unequal
probabilities.
</p>
<p>
Each edge chooses one of the four partitions with probabilities A, B, C,
and D, respectively. These probabilities, the initiator parameters, are
provided in Table \ref{tbl:initiator}. For this undirected graph, only
parameters A and B are independent. The parameters are perturbed for
each level as in [Seshadhri, <i>et al</i>., 2011]. Algorithm
\ref{alg:rmat.edge} provides the high-level listing for generating RMAT
edges and shows the mapping from edge index to PRNG arguments.
</p>
<table id="tbl:initiator" border="2" cellspacing="0" cellpadding="6" rules="groups" frame="hsides">
<caption align="above"><span class="table-number">Table 1:</span> Initiator parameters for the RMAT graph generator</caption>
<colgroup>
<col class="left" />
<col class="left" />
</colgroup>
<tbody>
<tr>
<td class="left">A = 0.55</td>
<td class="left">B = 0.1</td>
</tr>
<tr>
<td class="left">C = B = 0.1</td>
<td class="left">D = 1-(A+B+C) = 0.25</td>
</tr>
</tbody>
</table>
<p>
The RMAT generator is parallel down to the bit level. The
location-based PRNG guarantees any parallelization produces the same
result up to differences in floating-point arithmetic. All
IEEE-754-conforming platforms should produce identical results; this
benchmark is to be run with the default rounding direction.
</p>
<p>
The PRNG takes two arguments and returns four pseudo-random numbers.
For edge index <i>k</i>, the first PRNG argument always is <i>k</i>. The weight
is generated by using zero for the second argument. For each bit level
<i>s</i> from 0 to SCALE-1 in least- to most-significant bit order, the
second argument in the PRNG is 1+floor(<i>s</i> / 2). Given four returned
pseudo-random values labeled 0 through 3, bit level <i>s</i> uses two values.
The first value, labeled 2*(<i>s</i> % 2), provides the parameter
perturbation. The second, labeled 1+2*(<i>s</i> % 2), provides the quadrant.
The example code in Algorithm \ref{alg:rmat.edge} and the reference
implementation implement this more efficiently by generating all the
2*SCALE pseudo-random numbers into one column-major array.
</p>
<div class="org-src-container">
<label class="org-src-name">RMAT edge function.</label>
<pre class="src src-Octave" id="alg:rmat.edge">function [v1, v2, w] = rmat_edge (k, SCALE)
## Set initiator probabilities.
[A, B] = deal (0.55, 0.1);
## Noise factor for perturbing the initiator.
noisefact = 0.1;
## Collect all the PRNG outputs used by the
## vertices at k. Each call returns four
## pseudo-random numbers used alternately for
## the parameter perturbation and the quadrant
## over *two* scales. If SCALE is odd, this
## will waste two generated numbers.
rnd = zeros (1, 2*(SCALE + mod (SCALE, 2)));
for scl=0:2:(SCALE-1),
idx = 1 + ((4*floor ((scl+1)/2)):(4*floor ((scl+2)/2)-1));
rnd(idx) = PRNG (k, 1+floor (scl/2));
endfor
rnd = reshape (rnd(1:2*SCALE), 2, SCALE);
rnd = rnd.'; # Silly optimization for
# column-major ordering.
mu = noisefact * (2 * rnd(:, 1) - 1);
As = A * (1 - 2 * mu / (1 - 2*B));
Bs = B * (1 + mu);
## Cast the darts into quadrants using the
## perturbed parameters.
scl = 2.^(0:SCALE-1).';
v1 = sum ((rnd(:, 2) >= As + Bs) .* scl);
v2 = sum ((or (and (rnd(:, 2) >= As,
rnd(:, 2) < As + Bs),
rnd(:, 2) >= As + 2*Bs)) .* scl);
endfunction
</pre>
</div>
</div>
</div>
</div>
<div id="outline-container-sec-5-4" class="outline-3">
<h3 id="sec-5-4"><span class="section-number-3">5.4</span> Scrambling Vertex Numbers</h3>
<div class="outline-text-3" id="text-5-4">
<p>
To remove vertex numbering locality, vertex numbers are scrambled. The
scrambled numbers remain in the range [0, 2<sup>SCALE</sup>). The exact
scrambling algorithm is provided in the reference code. The scrambling
uses two 64-bit seed values derived from PRNG(-1, -1).
</p>
</div>
</div>
<div id="outline-container-sec-5-5" class="outline-3">
<h3 id="sec-5-5"><span class="section-number-3">5.5</span> References</h3>
<div class="outline-text-3" id="text-5-5">
<ul class="org-ul">
<li>D. Chakrabarti, Y. Zhan, and C. Faloutsos, R-MAT: A recursive model
for graph mining, SIAM Data Mining 2004.
</li>
<li>C. Seshadhri, A. Pinar, and T.G. Kolda, "An In-depth Study of
Stochastic Kronecker Graphs," 2011 IEEE 11th International
Conference on Data Mining (ICDM), pp.587-596, 11-14 Dec. 2011 doi:
10.1109/ICDM.2011.23. Pre-print at <a href="http://arxiv.org/abs/1102.5046">http://arxiv.org/abs/1102.5046</a> .
</li>
</ul>
</div>
</div>
</div>
<div id="outline-container-kernel1" class="outline-2">
<h2 id="kernel1"><a id="sec-6" name="sec-6"></a><span class="section-number-2">6</span> Kernel 1 – Graph Construction</h2>
<div class="outline-text-2" id="text-kernel1">
</div>
<div id="outline-container-sec-6-1" class="outline-3">
<h3 id="sec-6-1"><span class="section-number-3">6.1</span> Description</h3>
<div class="outline-text-3" id="text-6-1">
<p>
The first kernel may transform the edge list to any data structures
(held in internal or external memory) that are used <b>unmodified</b> for the
remaining kernels. For instance, <a href="#kernel1">Kernel 1</a> may construct a (sparse) graph
from a list of tuples; each tuple contains endpoint vertex identifiers
for an edge, and a weight that represents data assigned to the edge.
</p>
<p>
The graph may be represented in any manner, but it must not be modified
by or between subsequent kernels. A constant number of scalars (not
proportional to the graph size) may be collected during construction for
later use. The general graph structure must not include information
proportional to the graph size for algorithms implementing <a href="#kernel2">Kernel 2</a> or
<a href="#kernel3">Kernel 3</a> like short-cut edges or spanning trees. Using <a href="#kernel3">Kernel 3</a> as an
example, pre-computing the Δ for Δ-stepping algorithms is
permitted, but pre-computing the component hierarchies for Thorup's
algorithm is not.
</p>
<p>
There are various internal memory representations for sparse graphs,
including (but not limited to) sparse matrices and (multi-level) linked
lists. For the purposes of this application, the kernel is provided only
the total number of vertices, the edge list, and the edge list's size.
Further information must be computed within this kernel. Algorithm
\ref{alg:kernel.1} provides a high-level sample implementation of
<a href="#kernel1">Kernel 1</a>.
</p>
<p>
The process of constructing the graph data structure (in internal or
external memory) from the set of tuples must be timed and is reported in
the <a href="#benchmarkoutput">output</a>.
</p>
<div class="org-src-container">
<label class="org-src-name">High-level implementation of Kernel 1</label>
<pre class="src src-Octave" id="alg:kernel.1">function G = kernel_1 (ijw)
## Compute a sparse adjacency matrix representation
## of the graph with edges from ij.
## Remove self-edges.
ijw(ijw(:, 1) == ijw(:, 2), :) = [];
## Adjust away from zero labels.
ijw(:, [1 2]) = ijw(:, [1 2]) + 1;
## Find the maximum label for sizing.
N = max (max (ijw(:, [1 2])));
## Order into a single triangle.
mask = ijw(:, 1) < ijw(:, 2);
ijw(mask, [1 2]) = ijw(mask, [2 1]);
## Create the matrix, ensuring it is square.