-
Notifications
You must be signed in to change notification settings - Fork 3
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
Ghosh, Sayan
committed
Jun 6, 2020
0 parents
commit ce3e69c
Showing
7 changed files
with
3,452 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,29 @@ | ||
BSD 3-Clause License | ||
|
||
Copyright (c) 2019, Battelle Memorial Institute | ||
All rights reserved. | ||
|
||
Redistribution and use in source and binary forms, with or without | ||
modification, are permitted provided that the following conditions are met: | ||
|
||
1. Redistributions of source code must retain the above copyright notice, this | ||
list of conditions and the following disclaimer. | ||
|
||
2. Redistributions in binary form must reproduce the above copyright notice, | ||
this list of conditions and the following disclaimer in the documentation | ||
and/or other materials provided with the distribution. | ||
|
||
3. Neither the name of the copyright holder nor the names of its | ||
contributors may be used to endorse or promote products derived from | ||
this software without specific prior written permission. | ||
|
||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" | ||
AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE | ||
IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE | ||
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE | ||
FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL | ||
DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR | ||
SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER | ||
CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, | ||
OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE | ||
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,36 @@ | ||
CXX = mpicxx | ||
# use -xmic-avx512 instead of -xHost for Intel Xeon Phi platforms | ||
OPTFLAGS = -O3 -xHost -DPRINT_DIST_STATS -DPRINT_EXTRA_NEDGES | ||
# -DPRINT_EXTRA_NEDGES prints extra edges when -p <> is passed to | ||
# add extra edges randomly on a generated graph | ||
# use export ASAN_OPTIONS=verbosity=1 to check ASAN output | ||
SNTFLAGS = -std=c++11 -fsanitize=address -O1 -fno-omit-frame-pointer | ||
CXXFLAGS = -std=c++11 -g $(OPTFLAGS) | ||
|
||
ENABLE_DUMPI_TRACE=0 | ||
ENABLE_SCOREP_TRACE=0 | ||
|
||
ifeq ($(ENABLE_DUMPI_TRACE),1) | ||
TRACERPATH = $(HOME)/builds/sst-dumpi/lib | ||
LDFLAGS = -L$(TRACERPATH) -ldumpi | ||
else ifeq ($(ENABLE_SCOREP_TRACE),1) | ||
SCOREP_INSTALL_PATH = /usr/common/software/scorep/6.0/intel | ||
INCLUDE = -I$(SCOREP_INSTALL_PATH)/include -I$(SCOREP_INSTALL_PATH)/include/scorep -DSCOREP_USER_ENABLE | ||
LDAPP = $(SCOREP_INSTALL_PATH)/bin/scorep --user --nocompiler --noopenmp --nopomp --nocuda --noopenacc --noopencl --nomemory | ||
endif | ||
|
||
OBJ = main.o | ||
TARGET = neve | ||
|
||
all: $(TARGET) | ||
|
||
%.o: %.cpp | ||
$(CXX) $(INCLUDE) $(CXXFLAGS) -c -o $@ $^ | ||
|
||
$(TARGET): $(OBJ) | ||
$(LDAPP) $(CXX) -o $@ $^ | ||
|
||
.PHONY: clean | ||
|
||
clean: | ||
rm -rf *~ *.dSYM nc.vg.* $(OBJ) $(TARGET) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,133 @@ | ||
*-*-*-*-*-*-*-*-* | ||
| | | | | | | | | | ||
*-*-*-neve-*-*-*- | ||
| | | | | | | | | | ||
*-*-*-*-*-*-*-*-* | ||
|
||
***** | ||
About | ||
***** | ||
|
||
`neve` is a microbenchmark that distributes a graph (real-world or synthetic) across | ||
processes and then performs variable-sized message exchanges (equivalent to the number | ||
of ghost vertices shared between processes) between process neighbors and reports | ||
bandwidth/latency. Further, neve allows options to shrink a real-world graph (by a | ||
percent of ghost vertices shared across processes), study performance of message exchanges | ||
for a specific process neighborhood, analyze the impact of an edge-balanced real-world | ||
graph distribution, and generate a graph in-memory. neve reports bandwidth/latency, and | ||
follows the best practices of existing MPI microbenchmarks. | ||
|
||
We require the total number of processes to be a power of 2 and total number of vertices | ||
to be perfectly divisible by the number of processes when parallel RGG generation options | ||
are used. This constraint does not apply to real world graphs passed to neve. | ||
|
||
neve has an in-memory random geometric graph generator (just like our other mini-application, | ||
miniVite[?]) that can be used for weak-scaling analysis. An n-D random geometric graph (RGG) | ||
is generated by randomly placing N vertices in an n-D space and connecting pairs of vertices | ||
whose Euclidean distance is less than or equal to d. We only consider 2D RGGs contained within | ||
a unit square, [0,1]^2. We distribute the domain such that each process receives N/p vertices | ||
(where p is the total number of processes). Each process owns (1 * 1/p) portion of the unit | ||
square and d is computed as (please refer to Section 4 of miniVite paper[?] for details): | ||
|
||
d = (dc + dt)/2; | ||
where, dc = sqrt(ln(N) / pi*N); dt = sqrt(2.0736 / pi*N) | ||
|
||
Therefore, the number of vertices (N) passed during execution on p processes must satisfy | ||
the condition -- 1/p > d. Unlike miniVite, the edge weights of the generated graph is always | ||
1.0, and not the Euclidean distance between vertices (because we only perform communication in | ||
this microbenchmark, and no computation, so edge weights are irrelevant). | ||
|
||
[?] Ghosh, Sayan, Mahantesh Halappanavar, Antonio Tumeo, Ananth Kalyanaraman, and | ||
Assefaw H. Gebremedhin. "miniVite: A Graph Analytics Benchmarking Tool for Massively | ||
Parallel Systems." In 2018 IEEE/ACM Performance Modeling, Benchmarking and Simulation of | ||
High Performance Computer Systems (PMBS), pp. 51-56. IEEE, 2018. | ||
|
||
Please note, the default distribution of graph generated from the in-built random | ||
geometric graph generator causes a process to only communicate with its two | ||
immediate neighbors. If you want to increase the communication intensity for | ||
generated graphs, please use the "-p" option to specify an extra percentage of edges | ||
that will be generated, linking random vertices. As a side-effect, this option | ||
significantly increases the time required to generate the graph, therefore low values | ||
are preferred. The max number of edges that can be added randomly must be less than | ||
equal to INT_MAX, at present we don't handle cases in which "-p <percent>" resolves to | ||
extra edges more than INT_MAX. | ||
|
||
We also allow users to pass any real world graph as input. However, we expect an input graph | ||
to be in a certain binary format, which we have observed to be more efficient than reading | ||
ASCII format files. The code for binary conversion (from a variety of common graph formats) | ||
is packaged separately with another software called Vite, which is our distributed-memory | ||
implementation of graph community detection. Please follow instructions in Vite README for | ||
binary file conversion. Vite could be downloaded from (please don't use the past PNNL/PNL | ||
link to download Vite, the following GitHub link is the correct one): | ||
https://github.com/Exa-Graph/vite | ||
|
||
Please contact the following for any queries or support: | ||
|
||
Sayan Ghosh, PNNL (sayan dot ghosh at pnnl dot gov) | ||
Mahantesh Halappanavar, PNNL (hala at pnnl dot gov) | ||
|
||
******* | ||
Compile | ||
******* | ||
|
||
neve is a C++ header-only library and requires an MPI implementation. It uses MPI Send/Recv and | ||
collectives (for synthetic graph generation). Please update the Makefile with compiler flags and | ||
use a C++11 compliant compiler of your choice. Invoke `make clean; make` after setting paths | ||
to MPI for generating the binary. Use `mpirun` or `mpiexec` or `srun` to execute the code with | ||
specific runtime arguments mentioned in the next section. | ||
|
||
Pass -DPRINT_DIST_STATS while building for printing distributed graph characteristics. | ||
|
||
***************** | ||
Execution options | ||
***************** | ||
E.g.: | ||
mpiexec -n 2 bin/./neve -f karate.bin -w -t 0 -z 5 | ||
mpiexec -n 4 bin/./neve -f karate.bin -w -t 0 -s 2 -g 5 | ||
mpiexec -n 2 bin/./neve -l -n 100 -w | ||
mpiexec -n 2 bin/./neve -n 100 -t 0 | ||
mpiexec -n 2 bin/./neve -p 2 -n 100 -w | ||
|
||
Possible options (can be combined): | ||
|
||
1. -f <bin-file> : Specify input binary file after this argument. | ||
2. -b : Only valid for real-world inputs. Attempts to distribute approximately | ||
equal number of edges among processes. Irregular number of vertices | ||
owned by a particular process. Increases the distributed graph creation | ||
time due to serial overheads, but may improve overall execution time. | ||
3. -n <vertices> : Only valid for synthetically generated inputs. Pass total number of | ||
vertices of the generated graph. | ||
4. -l : Use distributed LCG for randomly choosing edges. If this option | ||
is not used, we will use C++ random number generator (using | ||
std::default_random_engine). | ||
5. -p <percent> : Only valid for synthetically generated inputs. Specify percent of overall | ||
edges to be randomly generated between processes. | ||
6. -r <nranks> : This is used to control the number of aggregators in MPI I/O and is | ||
meaningful when an input binary graph file is passed with option "-f". | ||
naggr := (nranks > 1) ? (nprocs/nranks) : nranks; | ||
7 -w : Report Bandwidth in MB/s[*]. | ||
8. -t <0|1|2> : Report Latency in microseconds[*]. Option '0' uses nonblocking Send/Recv, | ||
Option '1' uses MPI_Neighbor_alltoall and Option '2' uses MPI_Neighbor_allgather. | ||
9. -x <bytes> : Maximum data exchange size (in bytes). | ||
10. -m <bytes> : Minimum data exchange size (in bytes). | ||
11. -s <PE> : Analyze a single process neighborhood. Performs bidirectional message | ||
exchanges between the process neighbors of a particular PE passed by the | ||
user. This transforms the neighbor subgraph of a particular PE as a fully | ||
connected graph. | ||
12. -g <count> : Specify maximum number of ghosts shared between process neighbors of a | ||
particular PE. This option is only valid when -s <PE> is passed (see #11). | ||
13. -z <percent> : Select a percentage of ghost vertices of a real-world graph distributed | ||
across processes as actual ghosts. For e.g., lets assume that a graph is | ||
distributed across 4 processes, and PE#0 has two neighbors, PE#1 and PE#3 | ||
with whom it shares a 100 ghost vertices each. In such a configuration, | ||
the b/w test would perform variable-sized message exchanges a 100 times | ||
between PE#0 and {PE#1, PE#3}. The -z <percent> option can limit the | ||
number of message exchanges by selecting a percentage of the actual the number | ||
of ghosts, but maintaining the overall structure of the original graph. | ||
Hence, this option can help 'shrink' a graph. Only valid for b/w test, because | ||
latency test will just perform message transfers between process neighborhoods. | ||
|
||
[*]Note: Unless -w or -t <...> is passed, the code will just load the graph and create the graph data | ||
structure. This is deliberate, in case we want to measure the overhead of loading a graph and | ||
time only the file I/O part (like measuring the impact of different #aggregators through the | ||
-r <nranks> option). |
Oops, something went wrong.