FrontOrder

[PPoPP' 25] POSTER: Frontier-guidedGraph Reordering

What is it?

FrontOrder is a graph reordering method, which improves graph locality to reduce the cache misses in graph processing.

Unlike the previous reordering which depends on static characteristics in graph, FrontOrder firstly profiles the traces of BFS samplings (primarily the concurrent frontiers) to construct the locality metric between vertices, and then utilizes an unsupervised ML method (K-means at present) to excavate the clusters of high locality to guide graph reordering. Furthermore, since the learned clusters of vertices are more likely to be co-activated, FrontOrder also fine-tunes the load balance in parallel graph processing with the clusters.

For more details, please refer to our paper.

Getting Started

0. Dependencies

At the minimum, FrontOrder depends on the following software:

make>=4.1
g++>=7.5.0 (compliant with C++11 standard and OpenMP is required)
Boost (>=1.58.0)

A possible command to satisfy all these dependencies:

apt-get install build-essentials libboost-dev libomp-dev

1. Compilation

To compile FrontOrder, run the following on root directory:

make

2. Data Preparation

FrontOrder takes graph in Compressed Sparse Row (CSR) format as both the input and output. In this way, after downloading or generating the graph in edgelist format (i.e. src_vertex dst_vertex format), we should convert the data into CSR format firstly for prepration.

2.1 Download Graph or Generate Graph

First of all, create a directory to preserve all the graph data.

ROOT_DIR=`pwd`
TOOL_DIR=$ROOT_DIR/tools
DATA_DIR=$ROOT_DIR/dataset
mkdir -p dataset && cd dataset

To obtain the graph data, you can download real-world graph data in edgelist format. For instance, to download soc-LiveJournal graph from SNAP large network collection:

wget http://snap.stanford.edu/data/soc-LiveJournal1.txt.gz
gzip -d soc-LiveJournal1.txt.gz

Or you could generate RMAT graph using PaRMAT. To generate a RMAT graph (named R20.el) with 1M vertices and 16M edges using 8 threads, you can run:

cd ${TOOL_DIR}/PaRMAT/Release
make
./PaRMAT -nEdges 16777216 -nVertices 1048576 -output R20.el -threads 8 -noDuplicateEdges
mv R20.el $DATA_DIR

2.2 Convert Edgelist into CSR format

We provide tools to support conversion from edgelist into CSR. For example, to convert RMAT graph R20.el, you could run:

cd ${TOOL_DIR}/el2csr
make
./el2csr ${DATA_DIR}/R20.el ${DATA_DIR}/R20.csr

3. Graph Reordering

The workflow of FrontOrder could be sliced into three stages: sampling, embedding and reordering.

3.1 Sampling BFS Frontiers

We execute BFS from several random starters to obtain the frontier distribution. Since we take GPOP as one of targeted system, we modified GPOP to sample the frontiers of BFS.

To sample frontiers for R20 graph from 15 random BFS requests, run:

cd $ROOT_DIR/system/GPOP/bfs
make
./bfs ${DATA_DIR}/R20.csr -rounds 15 # -rounds could define the number of BFS requests

Then the sampled frontiers of BFS execution will be preserved at ${DATA_DIR}/sample directory.

It is worth noticing that the frontier distribution is independent to underlying systems. GPOP is used as an example. You can modify your graph processing system to collect the BFS frontiers as you wish.

3.2 Embedding

Then we embed the sampled frontier history into feature space:

cd $TOOL_DIR
python embedding.py --data R20 --feat_dim=10 --data_dir=${DATA_DIR}

You can input python embedding.py --help to see the meaning of these parameters.

3.3 Reordering using FrontOrder

Please check the recipes before executing FrontOrder, a possible file tree when using R20.el as an example is shown as follows:

$DATA_DIR
|-- feature
|   |-- R20_dim_10.feat
|-- R20.csr
|-- R20.el
|-- sample
|   |-- R20.bfs.0
|   ...
|   |-- R20.bfs.9

To run FrontOrder:

./FrontOrder -d ${DATA_DIR}/R20.csr -o ${DATA_DIR}/R20-his.csr -r ${DATA_DIR}/R20-his.map \
-a 2 -s 128 -t 8 -f 10 -k 20 -i ${DATA_DIR}/feature/R20_dim_10.feat

The parameters in FrontOrder is illustrated as follows:

    ./FrontOrder
    -d [ --data ]            input data path (.csr)
    -o [ --output ]          output file path (.csr)
    -r [ --output_map]       output mapping file (vertex mapping of reordering)
    -a [ --algorithm ](=0)   reordering algorithm (Including FrontOrder and some baselines)
    -t [ --thread ]  (=20)   threads (number of threads)
    -s [ --size ]  (=1024)   partition size (in KB, normally=L2 Cache/2)
    -f [ --feat ]  (=10)     feature size (*FrontOrder only)
    -k [ --kv ] (=20)        K value for K-means (*FrontOrder Only)
    -i [ --input_feat ]      input feature file (*FrontOrder Only)

    [ --algorithm ]
    FrontOrder_without_balance = 0,(*Out proposed method for single-thread)
    FrontOrder = 1, (*Our proposed method, default)
    FrontOrder_pcpm = 2, (*Out proposed method, designed for PCPM)
    random = 3, 
    sort = 4, 
    fbc = 5, 
    hc = 6, 
    dbg = 7, 
    corder = 8,

Evaluation

After generating the reordered CSR graph, we can test the graph processing performance under different CSR format.

Running Graph Algorithms

We take GPOP as example, firstly we change directory to GPOP:

cd ${ROOT_DIR}/system/GPOP

Taking BFS as an example:

cd BFS

Notice: please uncomment the SAMPLE=1 line. And we run make clean && make, then we run bfs in GPOP using both FrontOrder and original graph format:

./bfs ../../../dataset/R20.csr -s <start_vertex> -t <thread_num> -rounds <round_num> # original graph 
./bfs ../../../dataset/R20-his.csr -s <start_vertex> -t <thread_num> -rounds <round_num> -map ../../../dataset/R20-his.map # original graph

Benchmark Summary

We evaluate the effect of FrontOrder on several representative graph processing systems, including Ligra (in-memory, vertex-centric), GPOP (in-memory, partition-centric), and GridGraph (storage-based out-of-core system).

Testbed

Item	Configurations
CPU	2 x Intel Xeon Platinum 8163 [email protected], each with 24 Cores
Threads	Up to 96 threads(With hyperthread)
Cache	32KB L1d-Cache, 32KB L1i-Cache 1MB L2 Cache, 32MB LLC
Memory	512GB DDR4 DIMM
OS	Ubunut 16.04.7 LTS

Evaluation Results

We compared the performance of FrontOrder with large number of existing graph reordering methods, including sort-by-degree, Degre-based Grouping (DBG), Hub Clustering (HC), Frequency-based Clustering(FBC), Corder(CO), rabbit(RBT) and Gorder(GO). We show the average MTEPS of running BFS algorithm on 6 different graphs when using baseline (without reordering), SOTA (the best performance in existing method) and FrontOrder.

Methods	Ligra	GPOP	GridGraph
baseline	5210 MTEPS	5266 MTEPS	272 MTEPS
SOTA	5807 MTEPS(CO)	5856 MTEPS(RBT)	309 MTEPS(DBG)
FrontOrder	7149 MTEPS	7262 MTEPS	401 MTEPS

As we can see, FrontOrder could improve the graph processing efficiency on different graph processing systems, and can beat the most effective existing reordering methods.

Please refer to our paper to see the more details about the evaluation.

Future Work

Customize graph partition for heterogeneous computing architecture (On-going)
XXX

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
img		img
system		system
tools		tools
toy		toy
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
global.hpp		global.hpp
graph.hpp		graph.hpp
main.cpp		main.cpp
makefile		makefile
order.hpp		order.hpp
parser.hpp		parser.hpp
reorder_graphs.sh		reorder_graphs.sh
run_gpop.sh		run_gpop.sh
run_ligra.sh		run_ligra.sh
sort.hpp		sort.hpp
vec2d.hpp		vec2d.hpp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

FrontOrder

What is it?

Getting Started

0. Dependencies

1. Compilation

2. Data Preparation

2.1 Download Graph or Generate Graph

2.2 Convert Edgelist into CSR format

3. Graph Reordering

3.1 Sampling BFS Frontiers

3.2 Embedding

3.3 Reordering using FrontOrder

Evaluation

Running Graph Algorithms

Benchmark Summary

Testbed

Evaluation Results

Future Work

About

Releases

Packages

Languages

License

RecoderChris/FrontOrder

Folders and files

Latest commit

History

Repository files navigation

FrontOrder

What is it?

Getting Started

0. Dependencies

1. Compilation

2. Data Preparation

2.1 Download Graph or Generate Graph

2.2 Convert Edgelist into CSR format

3. Graph Reordering

3.1 Sampling BFS Frontiers

3.2 Embedding

3.3 Reordering using FrontOrder

Evaluation

Running Graph Algorithms

Benchmark Summary

Testbed

Evaluation Results

Future Work

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages