Gangda Deng, Ömer Faruk Akgül, Hongkuan Zhou, Hanqing Zeng, Yinglong Xia, Jianbo Li, Viktor Prasanna
Contact: Gangda Deng ([email protected])
The directory structure of this project is as follows:
.
├── baseline <- Multi-machine baselines, implemented
│ │ in pytorch only
│ ├── python_ppr
│ │ ├── data.py <- Data preprocessing
│ │ ├── main.py <- Multi-machine ppr baseline
│ │ └── single_machine.py <- Single-machine ppr experiments
│ └── python_randomwalk
├── data_generation
│ └── gen_engine_data.py
├── engine <- C++ graph engine backend
│ ├── Graph.h
│ ├── PPR.h
│ ├── global.h <- Data type definitions, should be aligned
│ │ with definitions in frontend/graph.py
│ └── ...
├── frontend <- Python graph engine frontend
│ ├── graph.py <- Data type definitions and wrapper classes
│ ├── ppr.py <- Distributed ppr workflow
│ ├── random_walk.py <- Distributed random_walk workflow
│ ├── run_ppr.py <- Distributed ppr entrance
│ ├── run_rw.py <- Distributed random_walk entrance
│ └── ...
├── shadow_gnn <- ShaDow-GNN example
│ ├── run_shadow.py
│ └── ...
├── test
│ ├── communication_libs <- Test sets of torch.rpc and torch.gloo
│ └── ...
├── run_ppr.sh <- Graph engine experiments
├── run_shadow_preproc.sh <- Ego-graphs preprocessing for ShaDow-GNN
└── run_shadow_train.sh <- Shadow-GNN training scripts
python engine/setup.py build_ext --build-lib=frontend
Before running the graph engine in each machine, we need to generate the graph shards first:
python data_generation/preprocess_ogb.py --data ogbn-products --unweighted --input_path <path-to-dgl-dataset> --output_path <path-to-preprocessed-data>
python data_generation/gen_engine_data.py --data ogbn-products --num_partition 4 --input_path <path-to-preprocessed-data> --output_path <path-to-engine-data>
Here we download and preprocess the dataset in dgl format, then convert it to our graph engine format.
Griphin currently supports high-performance distributed Random Walk
and Personalized PageRank (PPR)
calculation with pytorch Tensor interface.
- Distributed random walk
Key arguments:
python frontend/run_rw.py --dataset ogbn-products --data_path <path-to-engine-data> --num_roots 8192 --walk_length 15 --num_machines 4
--version 2
: version 2 is the latest and the best version--num_threads 1
: currently, set num threads to 1 gives the best result
- Distributed PPR (Pseudocode)
Key arguments:
python frontend/run_ppr.py --dataset ogbn-products --data_path <path-to-engine-data> --inference_out_path <path-to-ppr-data> --alpha 0.462 --epsilon 1e-6 --k 150
--inference_out_path your_path
: output path for PPR results; perform full graph SSPPR inference if it exists--version overlap
:overlap
is the best version which overlaps the remote and local operations.cpp_batch
presents a clearer breakdown by avoiding overlapping--num_machines 4
: number of machines (simulated as processes in a single machine)--num_processes 1
: number of processes used for SSPPR computation per machine. Note that the computation between different source nodes is embrassingly parallel. With sufficient bandwidth, the throughput is almost linearly proportional to the number of processes--num_threads 1
: number of threads used in c++ push (implemented with parallel-hashmap)
We provide baseline demos, which are implemented by PyG operators, for comparison.
- Distributed random walk
python baseline/python_randomwalk/main.py
- Single Machine PPR
python baseline/python_ppr/single_machine.py --max_degree 1000 --drop_coe 0.9 --num_source 100
- Distributed PPR
python baseline/python_ppr/main.py
In this repo, we also provide an implementation of ShaDow-GAT with our PPR subgraph sampler.
We achieve 0.8297 +- 0.0041
test accuracy on ogbn-product dataset,
which reproduces the highest result of ShaDow_GNN.
For a quick start, modify the necessary parts in the following shell scripts and start running.
sh run_shadow_preproc.sh
sh run_shadow_train.sh
To eliminate the data loading time before training, we first load the subgraph data into the shared memory once and then conduct multiple trials without data loading.
python shadow_gnn/load_shm.py --datas_file "${FILE_PATH}/${DATASET}_egograph_datas.pt"
sh run_shadow_train.sh
Griphin is MIT licensed, as found in the LICENSE file.
@inproceedings{deng2023efficient,
title={An Efficient Distributed Graph Engine for Deep Learning on Graphs},
author={Deng, Gangda and Akg{\"u}l, {\"O}mer Faruk and Zhou, Hongkuan and Zeng, Hanqing and Xia, Yinglong and Li, Jianbo and Prasanna, Viktor},
booktitle={Proceedings of the SC'23 Workshops of The International Conference on High Performance Computing, Network, Storage, and Analysis},
pages={922--931},
year={2023}
}