Home

Awesome

NodeFormer: A Graph Transformer with Linear Complexity

The official implementation for "NodeFormer: A Scalable Graph Structure Learning Transformer for Node Classification" which is accepted to NeurIPS22 as a spotlight presentation.

Related materials: [paper], [slides], [blog Chinese | English], [vedio Chinese | English], [tutorial]

Two follow-up works: DIFFormer (ICLR2023 spotlight) a physics-inspired scalable Transformer, SGFormer (NeurIPS2023) a simplified Transformer scaling to billion-scale graphs

What's news

This work takes an initial step for exploring Transformer-style graph encoder networks for large node classification graphs, dubbed as NodeFormer, as an alternative to common Graph Neural Networks, in particular for encoding nodes in an input graph into embeddings in latent space.

The highlights of NodeFormer

NodeFormer is a pioneering Transformer model for node classification on large graphs. NodeFormer scales all-pair message passing with efficient latent structure learning to million-level nodes. NodeFormer has several merits:

Structures of the Codes

The key module of NodeFormer is the kernelized (Gumbel-)Softmax message passing which achieves all-pair message passing on a latent graph in each layer with $O(N)$ complexity. The nodeformer.py implements our model:

For other files, the descriptions are below:

Datasets

We provide an easy access to the used datasets in the Google drive. This also contains other commonly used graph datasets, except the large-scale graphs OGBN-Proteins and Amazon2M which can be downloaded automatically with our codes See here for how to get the datasets ready for running our codes.

The information and sources of datasets are summarized below

Key results

DatasetSplitMetricResultHyper-parameters/Checkpoints
Corarandom 50%/25%/25%Accuracy88.80 (0.26)train script
CiteSeerrandom 50%/25%/25%Accuracy76.33 (0.59)train script
Deezerrandom 50%/25%/25%ROC-AUC71.24 (0.32)train script
Actorrandom 50%/25%/25%Accuracy35.31 (0.89)train script
OGBN-Proteinspublic splitROC-AUC77.45 (1.15)train script, checkpoint, test script
Amazon2Mrandom 50%/25%/25%Accuracy87.85 (0.24)train script, checkpoint and fixed splits, test script
Mini-ImageNet (kNN, k=5)random 50%/25%/25%Accuracy86.77 (0.45)train script
Mini-ImageNet (no graph)random 50%/25%/25%Accuracy87.46 (0.36)train script
20News-Group (kNN, k=5)random 50%/25%/25%Accuracy66.01 (1.18)train script
20News-Group (no graph)random 50%/25%/25%Accuracy64.71 (1.33)train script
Cora20 nodes per class for trainAccuracy83.4 (0.2)train script
CiteSeer20 nodes per class for trainAccuracy73.0 (0.3)train script
Pubmed20 nodes per class for trainAccuracy81.5 (0.4)train script

How to run our codes?

  1. Install the required package according to requirements.txt

  2. Create a folder ../data and download the datasets from here (For large graph datasets Proteins and Amazon2M, the datasets will be automatically downloaded)

  3. To run the training and evaluation on eight datasets we used, one can use the scripts in run.sh:

# node classification on small datasets
python main.py --dataset cora --rand_split --metric acc --method nodeformer --lr 0.001 \
--weight_decay 5e-3 --num_layers 2 --hidden_channels 32 --num_heads 4 --rb_order 2 --rb_trans sigmoid \
--lamda 1.0 --M 30 --K 10 --use_bn --use_residual --use_gumbel --runs 5 --epochs 1000 --device 0

# node classification on large datasets
python main-batch.py --dataset ogbn-proteins --metric rocauc --method nodeformer --lr 1e-2 \
--weight_decay 0. --num_layers 3 --hidden_channels 64 --num_heads 1 --rb_order 1 --rb_trans identity \
--lamda 0.1 --M 50 --K 5 --use_bn --use_residual --use_gumbel --use_act --use_jk --batch_size 10000 \
--runs 5 --epochs 1000 --eval_step 9 --device 0

# image and text datasets
python main.py --dataset mini --metric acc --rand_split --method nodeformer --lr 0.001\
--weight_decay 5e-3 --num_layers 2 --hidden_channels 128 --num_heads 6\
--rb_order 2 --rb_trans sigmoid --lamda 1.0 --M 30 --K 10 --use_bn --use_residual --use_gumbel \
--run 5 --epochs 300 --device 0

  1. We also provide the checkpoints of NodeFormer on two large datasets, OGBN-Proteins and Amazon2M. One can download the trained models into ../model/ and run the scripts in run_test_large_graph.sh for reproducing the results.

Potential Applications and More Usability

NodeFormer can in principle be applied to solve three families of tasks:

Our work takes an initial step for exploring how to build a scalable graph Transformer model for node classification, and we also believe there exists ample room for further research and development as future works. One can also use our implementation kernelized_softmax() and kernelized_gumbel_softmax() for related projects concerning e.g., structure learning and communication, where the scalability matters.

Citation

If you find our codes useful, please consider citing our work

      @inproceedings{wu2022nodeformer,
      title = {NodeFormer: A Scalable Graph Structure Learning Transformer for Node Classification},
      author = {Qitian Wu and Wentao Zhao and Zenan Li and David Wipf and Junchi Yan},
      booktitle = {Advances in Neural Information Processing Systems (NeurIPS)},
      year = {2022}
      }

ACK

We acknowledge the implementation of the softmax kernel

https://github.com/lucidrains/performer-pytorch

and the training pipeline for GNN node classification

https://github.com/CUAI/Non-Homophily-Benchmarks