Home

Awesome

<!-- PROJECT LOGO --> <br /> <div align="center"> <a href="https://github.com/sheldonresearch/ProG"> <img height="150" src="Logo.jpg?sanitize=true" /> </a> </div> <h3 align="center">🌟ProG: A Unified Python Library for Graph Prompting🌟</h3> <div align="center">

| Quick Start | Paper | Media Coverage | Call For Contribution |

Testing Status Testing Status Testing Status

</div>

🌟ProG🌟 (Prompt Graph) is a library built upon PyTorch to easily conduct single or multi-task prompting for pre-trained Graph Neural Networks (GNNs). You can easily use this library to conduct various graph workflows like supervised learning, pre-training and prompting, and pre-training and finetuning for your node/graph-level tasks. The starting point of this library is our KDD23 paper All in One (Best Research Paper Award, which is the first time for Hong Kong and Mainland China).

<div align="center">

Click to See A Full List of Our Works in Graph Prompts

</div> <h3 align="left">🌟Acknowledgement</h3> <div align="left"> </div>

Development Progress:

<br> <div align="left">

</div> <details close> <summary>History News</summary> </details> <br>

Installation

Pypi

From ProG 1.0 onwards, you can install and use ProG. For this, simply run

pip install prompt-graph

Or you can git clone our repository directly.

Environment Setup

Before you begin, please make sure that you have Anaconda or Miniconda installed on your system. This guide assumes that you have a CUDA-enabled GPU.

# Create and activate a new Conda environment named 'ProG'
conda create -n ProG
conda activate ProG

# Install Pytorch and DGL with CUDA 11.7 support
# If your use a different CUDA version, please refer to the PyTorch and DGL websites for the appropriate versions.
conda install numpy
conda install pytorch==2.0.1 pytorch-cuda=12.2 -c pytorch -c nvidia

# Install additional dependencies
pip install torch_geometric pandas torchmetrics Deprecated 

In addition, You can use our pre-train GNN directly or use our pretrain module to pre-train the GNN you want by

pip install torch_cluster  -f https://data.pyg.org/whl/torch-2.3.0+cu121.html

the torch and cuda version can refer to https://data.pyg.org/whl/

Quick Start

The Architecture of ProG is shown as follows:

<img height="350" src="/ProG_pipeline.jpg?sanitize=true" />

Firstly, download from onedrive https://1drv.ms/u/s!ArZGDth_ySjPjkW2n-zsF3_GGvC1?e=rEnBA7 (126MB)to get Experiment.zip. You can unzip to get our dataset pre-trained model which is already pre-trained, and induced graph, sample data in the few-shot setting. (Please make sure the unzipped folder's name is /Experiment. if the download link is unavailable, please drop us an email to let us know(barristanzi666@gmail.com)

Warning! The dataset providers may update dataset itself causing compatibility issues with the pretain models we provided. Reports on datasets (ENZYMES,BZR) have been found.

It is recommended to pretrain your model by yourself.

unzip Experiment.zip

We have provided scripts with hyper-parameter settings to get the experimental results

With Customized Hyperparameters

In downstream task, you can obtain the experimental results by running the parameters you want, for example,

python downstream_task.py --pre_train_model_path './Experiment/pre_trained_model/Cora/Edgepred_Gprompt.GCN.128hidden_dim.pth' --task NodeTask --dataset_name 'Cora' --gnn_type 'GCN' --prompt_type 'GPF-plus' --shot_num 1 --hid_dim 128 --num_layer 2  --lr 0.02 --decay 2e-6 --seed 42 --device 0
python downstream_task.py --pre_train_model_path './Experiment/pre_trained_model/BZR/DGI.GCN.128hidden_dim.pth' --task GraphTask --dataset_name 'BZR' --gnn_type 'GCN' --prompt_type 'All-in-one' --shot_num 1 --hid_dim 128 --num_layer 2  --lr 0.02 --decay 2e-6 --seed 42 --device 1

With Optimal Hyperparameters through Random Search

Perform a random search of hyperparameters for the GCN model on the Cora dataset. (NodeTask)

python bench.py --pre_train_model_path './Experiment/pre_trained_model/Cora/GraphCL.GCN.128hidden_dim.pth' --task NodeTask --dataset_name 'Cora' --gnn_type 'GCN' --prompt_type 'GPF-plus' --shot_num 1 --hid_dim 128 --num_layer 2 --seed 42 --device 0
<details> <summary ><strong>Table of The Following Contents</strong></summary> <ol> <li> <a href="#supportive-list">Supportive List</a> </li> <li> <a href="#pre-train-your-gnn-model">Pre-train your GNN model</a> </li> <li> <a href="#downstream-tasks">Downstream Tasks</a> </li> <li><a href="#datasets">Datasets</a></li> <li><a href="#prompt-class">Prompt Class</a></li> <li><a href="#environment-setup">Environment Setup</a></li> <li><a href="#todo-list">TODO List</a></li> </ol> </details>

with the default few-shot sample

For train and test sample split to reproduce the results in the benchmark, you can unzip node.zip -d './Experiment/sample_data' or do not unzip use the code to split the dataset Automatically

Supportive List

Supportive graph prompt approaches currently (keep updating):

Supportive graph pre-training strategies currently (keep updating):

Supportive graph backbone models currently (keep updating):

Beyond the above graph backbones, you can also seamlessly integrate nearly all graph models implemented by PyG.

**Click [here] to see more details information on these graph prompts, pre-training strategies, and graph backbones. **

Pre-train your GNN model

We have designed four pre_trained classes (Edgepred_GPPT, Edgepred_Gprompt, GraphCL, SimGRACE), which is in ProG.pretrain module, you can pre_train the model by running pre_train.py and setting the parameters you want. Or just unzip to get our dataset pre-trained model which is already pre-trained.

unzip Experiment.zip

In the pre-train phase, you can obtain the experimental results by running the parameters you want:

python pre_train.py --task Edgepred_Gprompt --dataset_name 'PubMed' --gnn_type 'GCN' --hid_dim 128 --num_layer 2 --epochs 1000 --seed 42 --device 0
import prompt_graph as ProG
from ProG.pretrain import Edgepred_GPPT, Edgepred_Gprompt, GraphCL, SimGRACE, NodePrePrompt, GraphPrePrompt, DGI, GraphMAE
from ProG.utils import seed_everything
from ProG.utils import mkdir, get_args
from ProG.data import load4node,load4graph

args = get_args()
seed_everything(args.seed)


if args.pretrain_task == 'SimGRACE':
    pt = SimGRACE(dataset_name = args.dataset_name, gnn_type = args.gnn_type, hid_dim = args.hid_dim, gln = args.num_layer, num_epoch=args.epochs, device=args.device)
if args.pretrain_task == 'GraphCL':
    pt = GraphCL(dataset_name = args.dataset_name, gnn_type = args.gnn_type, hid_dim = args.hid_dim, gln = args.num_layer, num_epoch=args.epochs, device=args.device)
if args.pretrain_task == 'Edgepred_GPPT':
    pt = Edgepred_GPPT(dataset_name = args.dataset_name, gnn_type = args.gnn_type, hid_dim = args.hid_dim, gln = args.num_layer, num_epoch=args.epochs, device=args.device)
if args.pretrain_task == 'Edgepred_Gprompt':
    pt = Edgepred_Gprompt(dataset_name = args.dataset_name, gnn_type = args.gnn_type, hid_dim = args.hid_dim, gln = args.num_layer, num_epoch=args.epochs, device=args.device)
if args.pretrain_task == 'DGI':
    pt = DGI(dataset_name = args.dataset_name, gnn_type = args.gnn_type, hid_dim = args.hid_dim, gln = args.num_layer, num_epoch=args.epochs, device=args.device)
if args.pretrain_task == 'NodeMultiGprompt':
    nonlinearity = 'prelu'
    pt = NodePrePrompt(args.dataset_name, args.hid_dim, nonlinearity, 0.9, 0.9, 0.1, 0.001, 1, 0.3, args.device)
if args.pretrain_task == 'GraphMultiGprompt':
    nonlinearity = 'prelu'
    pt = GraphPrePrompt(graph_list, input_dim, out_dim, args.dataset_name, args.hid_dim, nonlinearity,0.9,0.9,0.1,1,0.3, 0.1, args.device)
if args.pretrain_task == 'GraphMAE':
    pt = GraphMAE(dataset_name = args.dataset_name, gnn_type = args.gnn_type, hid_dim = args.hid_dim, gln = args.num_layer, num_epoch=args.epochs, device=args.device,
                  mask_rate=0.75, drop_edge_rate=0.0, replace_rate=0.1, loss_fn='sce', alpha_l=2)
pt.pretrain()

Load Data

Before we do the downstream task, we need to load the nessary data. For some specific prompt, we need to choose function load_induced_graph to the input of our tasker

def load_induced_graph(dataset_name, data, device):

    folder_path = './Experiment/induced_graph/' + dataset_name
    if not os.path.exists(folder_path):
            os.makedirs(folder_path)

    file_path = folder_path + '/induced_graph_min100_max300.pkl'
    if os.path.exists(file_path):
            with open(file_path, 'rb') as f:
                print('loading induced graph...')
                graphs_list = pickle.load(f)
                print('Done!!!')
    else:
        print('Begin split_induced_graphs.')
        split_induced_graphs(data, folder_path, device, smallest_size=100, largest_size=300)
        with open(file_path, 'rb') as f:
            graphs_list = pickle.load(f)
    graphs_list = [graph.to(device) for graph in graphs_list]
    return graphs_list


args = get_args()
seed_everything(args.seed)

print('dataset_name', args.dataset_name)
if args.downstream_task == 'NodeTask':
    data, input_dim, output_dim = load4node(args.dataset_name)   
    data = data.to(args.device)
    if args.prompt_type in ['Gprompt', 'All-in-one', 'GPF', 'GPF-plus']:
        graphs_list = load_induced_graph(args.dataset_name, data, args.device) 
    else:
        graphs_list = None 
         

if args.downstream_task == 'GraphTask':
    input_dim, output_dim, dataset = load4graph(args.dataset_name)

Downstream Tasks

In downstreamtask.py, we designed two tasks (Node Classification, Graph Classification). Here are some examples.

import prompt_graph as ProG
from ProG.tasker import NodeTask, LinkTask, GraphTask

if args.downstream_task == 'GraphTask':
    input_dim, output_dim, dataset = load4graph(args.dataset_name)

if args.downstream_task == 'NodeTask':
    tasker = NodeTask(pre_train_model_path = args.pre_train_model_path, 
                    dataset_name = args.dataset_name, num_layer = args.num_layer,
                    gnn_type = args.gnn_type, hid_dim = args.hid_dim, prompt_type = args.prompt_type,
                    epochs = args.epochs, shot_num = args.shot_num, device=args.device, lr = args.lr, wd = args.decay,
                    batch_size = args.batch_size, data = data, input_dim = input_dim, output_dim = output_dim, graphs_list = graphs_list)


if args.downstream_task == 'GraphTask':
    tasker = GraphTask(pre_train_model_path = args.pre_train_model_path, 
                    dataset_name = args.dataset_name, num_layer = args.num_layer, gnn_type = args.gnn_type, hid_dim = args.hid_dim, prompt_type = args.prompt_type, epochs = args.epochs,
                    shot_num = args.shot_num, device=args.device, lr = args.lr, wd = args.decay,
                    batch_size = args.batch_size, dataset = dataset, input_dim = input_dim, output_dim = output_dim)

_, test_acc, std_test_acc, f1, std_f1, roc, std_roc, _, _= tasker.run()

Kindly note that the comparison takes the same pre-trained pth. The absolute value of performance won't mean much because the final results may vary depending on different pre-training states.It would be more interesting to see the relative performance with other pre-training paradigms.

Bench Random Search

In our bench

Datasets

DatasetGraphsAvg.nodesAvg.edgesFeaturesNode classesTask (N / G)Category
Cora12,7085,4291,4337NHomophilic
Pubmed119,71788,6485003NHomophilic
CiteSeer13,3279,1043,7036NHomophilic
Actor17600300199325NHeterophilic
Wisconsin125151517035NHeterophilic
Texas118332517035NHeterophilic
ogbn-arxiv1169,3431,166,24312840NHomophilic & Large scale
DatasetGraphsAvg.nodesAvg.edgesFeaturesGraph classesTask (N / G)Domain
MUTAG18817.919.872Gsmall molecule
IMDB-BINARY100019.896.5302Gsocial network
COLLAP500074.52457.803Gsocial network
PROTEINS1,11339.172.832Gproteins
ENZYMES60032.662.1186Gproteins
DD1,178284.1715.7892Gproteins
COX246741.243.532Gsmall molecule
BZR40535.838.432Gsmall molecule

TODO List

Note <span style="color:blue"> Current experimental datasets: Node/Edge:Cora/Citeseer/Pubmed; Graph:MUTAG</span>


<a name="paper"></a>

<h3 align="center">🌹Please Cite Our Work If Helpful:</h3> <p align="center"><strong>Thanks! / 谢谢! / ありがとう! / merci! / 감사! / Danke! / спасибо! / gracias! ...</strong></p>
@inproceedings{sun2023all,
  title={All in One: Multi-Task Prompting for Graph Neural Networks},
  author={Sun, Xiangguo and Cheng, Hong and Li, Jia and Liu, Bo and Guan, Jihong},
  booktitle={Proceedings of the 26th ACM SIGKDD international conference on knowledge discovery \& data mining (KDD'23)},
  year={2023},
  pages = {2120–2131},
  location = {Long Beach, CA, USA},
  isbn = {9798400701030},
  url = {https://doi.org/10.1145/3580305.3599256},
  doi = {10.1145/3580305.3599256}
}

@article{wang2024does,
      title={Does Graph Prompt Work? A Data Operation Perspective with Theoretical Analysis}, 
      author={Qunzhong Wang and Xiangguo Sun and Hong Cheng},
      year={2024},
      journal = {arXiv preprint arXiv:2410.01635},
      url={https://arxiv.org/abs/2410.01635}
}


@article{zi2024prog,
      title={ProG: A Graph Prompt Learning Benchmark}, 
      author={Chenyi Zi and Haihong Zhao and Xiangguo Sun and Yiqing Lin and Hong Cheng and Jia Li},
      year={2024},
      journal = {the Thirty-Eighth Advances in Neural Information Processing Systems (NeurIPS 2024)},
      volume={},
      pages={}
}


@article{sun2023graph,
  title = {Graph Prompt Learning: A Comprehensive Survey and Beyond},
  author = {Sun, Xiangguo and Zhang, Jiawen and Wu, Xixi and Cheng, Hong and Xiong, Yun and Li, Jia},
  year = {2023},
  journal = {arXiv:2311.16534},
  eprint = {2311.16534},
  archiveprefix = {arxiv}
}

@article{zhang2024adaptive,
  title={Adaptive Coordinators and Prompts on Heterogeneous Graphs for Cross-Domain Recommendations},
  author={Hengyu Zhang and Chunxu Shen and Xiangguo Sun and Jie Tan and Yu Rong and Chengzhi Piao and Hong Cheng and Lingling Yi},
  journal={arXiv preprint arXiv:2410.11719},
  year={2024}
}

@inproceedings{li2024graph,
  title={Graph Intelligence with Large Language Models and Prompt Learning},
  author={Li, Jia and Sun, Xiangguo and Li, Yuhan and Li, Zhixun and Cheng, Hong and Yu, Jeffrey Xu},
  booktitle={Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining},
  pages={6545--6554},
  year={2024}
}

@inproceedings{zhao2024all,
      title={All in One and One for All: A Simple yet Effective Method towards Cross-domain Graph Pretraining}, 
      author={Haihong Zhao and Aochuan Chen and Xiangguo Sun and Hong Cheng and Jia Li},
      year={2024},
      booktitle={Proceedings of the 27th ACM SIGKDD international conference on knowledge discovery \& data mining (KDD'24)}
}


@inproceedings{gao2024protein,
  title={Protein Multimer Structure Prediction via {PPI}-guided Prompt Learning},
  author={Ziqi Gao and Xiangguo Sun and Zijing Liu and Yu Li and Hong Cheng and Jia Li},
  booktitle={The Twelfth International Conference on Learning Representations (ICLR)},
  year={2024},
  url={https://openreview.net/forum?id=OHpvivXrQr}
}


@article{chen2024prompt,
      title={Prompt Learning on Temporal Interaction Graphs}, 
      author={Xi Chen and Siwei Zhang and Yun Xiong and Xixi Wu and Jiawei Zhang and Xiangguo Sun and Yao Zhang and Yinglong Zhao and Yulin Kang},
      year={2024},
      eprint={2402.06326},
      archivePrefix={arXiv},
      journal = {arXiv:2402.06326}
}

@article{jin2024urban,
  title={Urban Region Pre-training and Prompting: A Graph-based Approach},
  author={Jin, Jiahui and Song, Yifan and Kan, Dong and Zhu, Haojia and Sun, Xiangguo and Li, Zhicheng and Sun, Xigang and Zhang, Jinghui},
  journal={arXiv preprint arXiv:2408.05920},
  year={2024}
}

@article{li2024survey,
      title={A Survey of Graph Meets Large Language Model: Progress and Future Directions}, 
      author={Yuhan Li and Zhixun Li and Peisong Wang and Jia Li and Xiangguo Sun and Hong Cheng and Jeffrey Xu Yu},
      year={2024},
      eprint={2311.12399},
      archivePrefix={arXiv},
      journal = {arXiv:2311.12399}
}


@article{wang2024ddiprompt,
  title={DDIPrompt: Drug-Drug Interaction Event Prediction based on Graph Prompt Learning},
  author={Wang, Yingying and Xiong, Yun and Wu, Xixi and Sun, Xiangguo and Zhang, Jiawei},
  journal={arXiv preprint arXiv:2402.11472},
  year={2024}
}


<br> <div name="our-work", align="center">

🌟A Full List of Our Works on Graph Prompts🌟

(* equal contribution † corresponding author)

</div>
  1. Qunzhong Wang*, Xiangguo Sun*†, Hong Cheng. Does Graph Prompt Work? A Data Operation Perspective with Theoretical Analysis. arXiv. Paper
  2. Xiangguo Sun, Hong Cheng, Jia Li, Bo Liu, Jihong Guan. All in One: Multi-Task Prompting for Graph Neural Networks. SIGKDD 23. Paper
  3. Haihong Zhao*, Aochuan Chen*, Xiangguo Sun*†, Hong Cheng, Jia Li†. All in One and One for All: A Simple yet Effective Method towards Cross-domain Graph Pretraining. SIGKDD 24. Paper
  4. Xi Chen, Siwei Zhang, Yun Xiong, Xixi Wu, Jiawei Zhang, Xiangguo Sun, Yao Zhang, Feng Zhao, Yulin Kang. Prompt Learning on Temporal Interaction Graphs. arXiv. Paper
  5. Chenyi Zi*, Haihong Zhao*, Xiangguo Sun†, Yiqing Lin, Hong Cheng, Jia Li. ProG: A Graph Prompt Learning Benchmark. NeurIPS 2024. Paper
  6. Xiangguo Sun, Jiawen Zhang, Xixi Wu, Hong Cheng, Yun Xiong, Jia Li. Graph Prompt Learning: A Comprehensive Survey and Beyond. arXiv. Paper
  7. Jia Li, Xiangguo Sun, Yuhan Li, Zhixun Li, Hong Cheng, Jeffrey Xu Yu. Graph Intelligence with Large Language Models and Prompt Learning. SIGKDD 24. Paper
  8. Yuhan Li*, Zhixun Li*, Peisong Wang*, Jia Li†, Xiangguo Sun, Hong Cheng, Jeffrey Xu Yu. A Survey of Graph Meets Large Language Model: Progress and Future Directions. IJCAI 2024. Paper
  9. Hengyu Zhang*, Chunxu Shen*, Xiangguo Sun†, Jie Tan, Yu Rong, Chengzhi Piao, Hong Cheng, Lingling Yi. Adaptive Coordinators and Prompts on Heterogeneous Graphs for Cross-Domain Recommendations. arXiv. Paper
  10. Ziqi Gao, Xiangguo Sun, Zijing Liu, Yu Li, Hong Cheng, Jia Li†. Protein Multimer Structure Prediction via PPI-guided Prompt Learning. ICLR 2024. Paper
  11. Jiahui Jin, Yifan Song, Dong Kan, Haojia Zhu, Xiangguo Sun, Zhicheng Li, Xigang Sun, Jinghui Zhang. Urban Region Pre-training and Prompting: A Graph-based Approach. arXiv. Paper
  12. Yingying Wang, Yun Xiong, Xixi Wu, Xiangguo Sun, Jiawei Zhang. DDIPrompt: Drug-Drug Interaction Event Prediction based on Graph Prompt Learning. CIKM 2024. Paper

Media Coverage

Media Reports

Online Discussion

Other research papers released by us


Call for Contributors!

Once you are invited as a contributor, you would be asked to follow the following steps:

When you finish all these jobs. I will get a notification and approve merging your branch to main. Once I finish, I will delete your branch, and next time you will repeat the above jobs.

A widely tested main branch will then be merged to the stable branch and a new version will be released based on stable branch.