Awesome
Artifact for Understanding and Bridging the Gaps in Current GNN Performance Optimizations
Getting started
Environment
- Python 3.7
- CUDA 10.1
- libcusparse.so.10
- libcurand.so.10
- libcublas.so.10
- Python packages
- DGL 0.4.3post2: pip3 install dgl-cu101==0.4.3post2
- Pytorch 1.6.0: pip3 install torch==1.6.0+cu101 torchvision==0.7.0+cu101 -f https://download.pytorch.org/whl/torch_stable.html
- torch_scatter 2.0.5: pip3 install torch-scatter==2.0.5+cu101 -f https://pytorch-geometric.com/whl/torch-1.6.0.html
- datasketch 1.5.1: pip3 install datasketch==1.5.1
- matplotlib # for figure generation
- seaborn # for figure generation
Hardware
- Tesla V100-PCIE-32GB
Get the code
git clone git@github.com:xxcclong/GNN-Computing.git
Setting up
cd artifact
mkdir build && cd build
cmake ..
make -j16
cp fig7.out ../Figure7/
cp fig8.out ../Figure8/
cp fig9.out ../Figure9/
cp fig10a.out ../Figure10/
cp fig10b.out ../Figure10/
cp fig11.out ../Figure11/
Data preparation
# get the compressed data
# put them into artifact/data/
wget -O data.zip https://cloud.tsinghua.edu.cn/f/2eebc696ce054681a6a4/?dl=1
# or download from onedrive: https://1drv.ms/u/s!Apc72a8BNm47f8k2kJEwBTdB-_o?e=JZ5zPd
# or download from dropbox: https://www.dropbox.com/s/d75okzxgy1uwyqk/data.zip?dl=0
unzip data.zip
After it, the file structure should be as follows
.
|-- CMakeLists.txt
|-- Figure10
|-- Figure11
|-- Figure7
|-- Figure8
|-- Figure9
|-- README.md
|-- data
|-- data_pyg
|-- include
`-- src
For every dataset (taking arxiv
for example), we have
- arxiv.config: with some graph information, such as the number of nodes and edges.
- arxiv.graph: two lines, the first line has the poiters of the range of the neighbors, the second line has the neighbor indexes. (similar to CSR format)
- arxiv.reorder_thres_0.2: the preprocessed reorder file, containing the number of
0
tonum_v - 1
, indicating the new node order of the graph.
Reproduce
Figure 7
cd Figure7
./run.sh
python3 draw_fig7.py # get fig7.pdf
P.S.
- PyG will expand the on-node tensor to edges, as a result, it will lead to out of memory. So you can find there are "RuntimeError: CUDA out of memory." during the test. However, the script can continue running, and the "out of memory" will be shown as "out of support" in the generated figure.
- The generated figure will not have breaks, so it looks unsimilar with the one in paper, but them have similar numbers.
- We can use
python3 dgl_prof.py --model sagelstm --gpu 0 --syn-name datasetname
to run GraphSAGE-LSTM using DGL. But due to its implementation, the CPU scheduling time is too much. So we re-implement it using CUDA, with negligible CPU overhead, for a fair comparison.
Figure 8
cd Figure8
./run.sh
python3 draw_fig8.py # get fig8.pdf
Figure 9
cd Figure9
./run.sh
python3 draw_fig9.py # get fig9.pdf
Figure 10
cd Figure10
./run.sh
python3 draw_fig10a.py # get fig10a.pdf
python3 draw_fig10b.py # get fig10b.pdf
Figure 11
cd Figure11
./run.sh
python3 draw_fig11.py # get fig11.pdf
Preprocessing graph
cd script
python3 cluster2.py arxiv # can replace arxiv with other dataset names
# the reorder file will be in data/arxiv_new_reorder_thres_0.2
Publication
Kezhao Huang, Jidong Zhai, Zhen Zheng, Youngmin Yi, and Xipeng Shen. 2021. Understanding and Bridging the Gaps in Current GNN Performance Optimizations. In Proceedings of PPoPP ’21: 26rd ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming, Virtual Event, Republic of Korea, February 27–March 3, 2021 (PPoPP'21), 14 pages. https://doi.org/10.1145/3437801.3441585
Contact
If meet some problems, feel free to send E-mail to hkz20@mails.tsinghua.edu.cn
and xxcclong@gmail.com
, we will reply as soon as possible.