Home

Awesome

<h1> GraphMAE2: A Decoding-Enhanced Masked Self-Supervised Graph Learner </h1>

Implementation for WWW'23 paper: GraphMAE2: A Decoding-Enhanced Masked Self-Supervised Graph Learner. <img src="assets/../asserts/overview.png">

[GraphMAE] The predecessor of this work: GraphMAE: Self-Supervised Masked Graph Autoencoders can be found here.

<h3> ❗ Update </h3>

[2023-04-19] We have made checkpoints of pre-trained models on different datasets available - feel free to download them from Google Drive.

<h2>Dependencies </h2> <h2>Quick Start </h2>

For quick start, you could run the scripts:

Node classification

sh run_minibatch.sh <dataset_name> <gpu_id> # for mini batch node classification
# example: sh run_minibatch.sh ogbn-arxiv 0
sh run_fullbatch.sh <dataset_name> <gpu_id> # for full batch node classification
# example: sh run_fullbatch.sh cora 0

# Or you could run the code manually:
# for mini batch node classification
python main_large.py --dataset ogbn-arxiv --encoder gat --decoder gat --seed 0 --device 0
# for full batch node classification
python main_full_batch.py --dataset cora --encoder gat --decoder gat --seed 0 --device 0

Supported datasets:

Run the scripts provided or add --use_cfg in command to reproduce the reported results.

For Large scale graphs Before starting mini-batch training, you'll need to generate local clusters if you want to use local-clustering for training. By default, the program will load dataset from ./data and save the generated local clusters to ./lc_ego_graphs. To generate a local cluster, you should first install localclustering and then run the following command:

python ./datasets/localclustering.py --dataset <your_dataset> --data_dir <path_to_data>

And we also provide the pre-generated local clusters which can be downloaded here and then put into lc_ego_graphs for usage.

<h2> Datasets </h2>

During the code's execution, the OGB and small-scale datasets (Cora, Citeseer, and PubMed) will be downloaded automatically. For the MAG-SCHOLAR dataset, you can download the raw data from here or use our processed version, which can be found here (the four feature files have to be merged in to a feature_f.npy). Once you have the dataset, place it into the ./data/mag_scholar_f folder for later usage. The folder should contain the following files:

- mag_scholar_f
|--- edge_index_f.npy
|--- split_idx_f.pt
|--- feature_f.npy
|--- label_f.npy

Soon, we will provide SAINTSampler as the baseline.

<h2> Experimental Results </h2>

Experimental results of node classification on large-scale datasets (Accuracy, %):

Ogbn-arxivOgbn-productsMag-Scholar-FOgbn-papers100M
MLP55.50±0.2361.06±0.0839.11±0.2147.24±0.31
SGC66.92±0.0874.87±0.2554.68±0.2363.29±0.19
Random-Init68.14±0.0274.04±0.0656.57±0.0361.55±0.12
CCA-SSG68.57±0.0275.27±0.0551.55±0.0355.67±0.15
GRACE69.34±0.0179.47±0.5957.39±0.0261.21±0.12
BGRL70.51±0.0378.59±0.0257.57±0.0162.18±0.15
GGD-75.70±0.40-63.50±0.50
GraphMAE71.03±0.0278.89±0.0158.75±0.0362.54±0.09
GraphMAE271.89±0.0381.59±0.0259.24±0.0164.89±0.04
<h1> Citing </h1>

If you find this work is helpful to your research, please consider citing our paper:

@inproceedings{hou2023graphmae2,
  title={GraphMAE2: A Decoding-Enhanced Masked Self-Supervised Graph Learner},
  author={Zhenyu Hou, Yufei He, Yukuo Cen, Xiao Liu, Yuxiao Dong, Evgeny Kharlamov, Jie Tang},
  booktitle={Proceedings of the ACM Web Conference 2023 (WWW’23)},
  year={2023}
}