Awesome
GE-SpMM: General-purposed Sparse Matrix-Matrix Multiplication on GPUs for Graph Neural Networks
Guyue Huang*, Guohao Dai, Yu Wang and Huazhong Yang
News
- The code for SC20 publication AE is archived at branch sc20_AE.
- The code for ACM SRC publication ( upgraded SpMM kernel) is released as part of the dgSPARSE project. Code link: https://github.com/dgSPARSE/dgSPARSE-Library/tree/main/src/ge-spmm
- Future updates will be released in dgSPARSE project.
Collaborative Projects
- dgSPARSE (Deep Graph Sparse Library) collects GPU sparse routines for HPC and GNN systems, developed in NICS-EFC research lab
- SpMM
- SDDMM
- Edge softmax
- ...
- CogDL is a flexible and efficient graph-learning framework that uses GE-SpMM to accelerate GNN algorithms.
@inproceedings{9355302,
author={Huang, Guyue and Dai, Guohao and Wang, Yu and Yang, Huazhong},
booktitle={SC20: International Conference for High Performance Computing, Networking, Storage and Analysis},
title={GE-SpMM: General-Purpose Sparse Matrix-Matrix Multiplication on GPUs for Graph Neural Networks},
year={2020},
pages={1-12},
doi={10.1109/SC41405.2020.00076}
}
@misc{huang2021efficient,
title={Efficient Sparse Matrix Kernels based on Adaptive Workload-Balancing and Parallel-Reduction},
author={Guyue Huang and Guohao Dai and Yu Wang and Yufei Ding and Yuan Xie},
year={2021},
eprint={2106.16064},
archivePrefix={arXiv},
primaryClass={cs.DC}
}
Abstract
GE-SpMM is a fast CSR-based CUDA kernel of sparse-dense matrix multiplication (SpMM), designed to accelerate GNN applications.
Get started
git clone --recursive https://github.com/hgyhungry/ge-spmm.git
Kernel performance
Prerequisites
CUDA toolkit 10.1
Compilation
source compile.sh
The script should also build the baseline implementation in ./merge-spmm.
Download dataset
cd data
source download_SNAP.sh
run tests
source run_test.sh
GunRock baseline
When cloning this repo, pass --recursive flag to automatically pull GunRock submodule.
cd gunrock-test
cp -r app/spmm ./gunrock/gunrock/app/
cp -r examples/spmm ./gunrock/examples
cp CMakeList.txt ./gunrock/examples
mkdir build && cd build
cmake .. && make spmm -j8
Run tests
cd $(this-repo)/gunrock-test/gunrock/
cp examples/spmm/test.sh .
source test.sh
Results are written to gr_test.txt
DGL integration
Prerequisites CUDA toolkit 10.1 PyTorch 1.4
GE-SpMM can be integrated to DGL. When cloning this repo, pass --recursive flag to automatically pull DGL repo. First build DGL from source. Instructions are also in this tutorial.
cd $(this-repo)/dgl-custom/dgl
mkdir build
cd build
cmake -DUSE_CUDA=ON ..
make -j8
cd ../python
python setup.py install --user
Run example code.
cd $(this-repo)/dgl-custom/benchmark
cd gcn
python gcn_dgl.py --gpu=0 --dataset=pubmed --n-hidden=128 --n-layers=1
cd ../sage
python sage_dgl.py --gpu=0 --dataset=pubmed --n-hidden=32 --n-layers=2 --aggregator-type=pool
Integrate DGL with GE-SpMM
cd $(this-repo)/dgl-custom/
cp *.cu ./dgl/src/kernel/cuda/
# rebuild dgl
cd build
make -j8
cd ../python
python setup.py install --user
Then you can run the same tests again to see differences of pytorch profiling report.
PyTorch extension
Prerequisites CUDA toolkit 10.1 PyTorch 1.4
We also wrap GE-SpMM to be a pytorch custom op. The operator is compiled in a JIT way and can be called in python code. We use this to substitute MessagePassing propogate step provided in pyg and test performance gain.
Build PyG baseline
pip install torch-scatter==latest+${CUDA} -f https://pytorch-geometric.com/whl/torch-1.4.0.html
pip install torch-sparse==latest+${CUDA} -f https://pytorch-geometric.com/whl/torch-1.4.0.html
pip install torch-cluster==latest+${CUDA} -f https://pytorch-geometric.com/whl/torch-1.4.0.html
pip install torch-spline-conv==latest+${CUDA} -f https://pytorch-geometric.com/whl/torch-1.4.0.html
cd $(this-repo)/pytorch-custom/pytorch_geometric
python setup.py install --user
Run tests
cd $(this-repo)/pytorch-custom
# first time running gcn_custom the cuda source will be compiled to lib file
# next time the compilation is not repeated and pytorch can directly load the built lib
python gcn_custom.py --n-hidden=32
python gcn_pyg.py --n-hidden=32
python gcn_custom_2layers.py --n-hidden=32
python gcn_pyg_2layers.py --n-hidden=32