Home

Awesome

Graph Condensation Benchmark (GC-Bench)

GC-Bench is an open and unified benchmark for Graph Condensation (GC) based on PyTorch and PyTorch Geometric. We embark on 12 state-of-the-art graph condensation algorithms in node-level and graph-level tasks and analyze their performance in 12 distinct graph datasets.

Overview of GC-Bench

<p align="center"> <img src="figs/GC-bench.png" width="100%" class="center" alt="pipeline"/> </p>

GC-Bench is a comprehensive Graph Condensation Benchmark designed to systematically analyze the performance of graph condensation methods in various scenarios. It examines the effectiveness, transferability, and complexity of graph condensation. We evaluate 12 state-of-the-art graph condensation algorithms on both node-level and graph-level tasks across 12 diverse graph datasets. Through benchmarking these GC algorithms, we make the following contributions:

Getting Started

To get started with GC-Bench, please follow the instructions below:

  1. Installation

    git clone https://github.com/RingBDStack/GC-Bench.git
    cd GC-Bench
    pip install -r requirements.txt
    conda env create -f environment.yml
    
    
  2. Download Datasets

    Download the node classification and graph classification datasets and store them in the specified directory. By default, this is the data directory, but you can customize it by changing the data_dir parameter in your configuration. The project structure should look like the following:

    GC-Bench
       ├── data
       │   ├── cora
       │   ├── citeseer
       │   └── ...
       └── DM
       └── ...
    

    Alternatively, you can leverage PyG to download and manage these datasets directly, eliminating the need to manually place them in the data directory.

Condense Graph Datasets

Different graph condensation methods (gradient-matching, distribution-matching, kernel ridge regression etc) can be used in corresponding directories.

For example, to run the Distribution Matching (DM) method, use the following command:

python DM/main.py --dataset=citeseer --epochs=2000 --gpu_id=0 --lr_adj=0.001 --lr_feat=0.01 --lr_model=0.1 --method=GCDM --nlayers=2 --outer=10 --reduction_rate=1 --save=1 --seed=1 --transductive=1

To run the Gradient Matching (GM) method for node classification, use the following command:

python GM/main_nc.py --dataset cora --transductive=1 --nlayers=2 --sgc=1 --lr_feat=1e-4 --lr_adj=1e-4 --r=0.5 --seed=1 --epoch=600 --save=1

To run the Gradient Matching (GM) method for graph classification, use the following command:

python GM/main_gc.py --dataset ogbg-molhiv --init real --nconvs=3 --dis=mse --lr_adj=0.01 --lr_feat=0.01 --epochs=1000 --eval_init=1 --net_norm=none --pool=mean --seed=1 --ipc=5 --save=1

Parameters can also be set in configuration files. To run experiments using a configuration file, use the following command:

python GM/main_nc.py --config config_DosCond --section DBLP-r0.250

This command will run the corresponding experiments with the parameters specified in the configuration file. The provided configuration files contain the parameters used to obtain the results presented in our benchmark.

Evaluate condensed graphs

For evaluation on different architectures, you can simply run:

python baselines/test_nc.py --method ${method} --dataset cora --gpu_id=0 --r=0.5 --nruns=5

Replace ${method} with the specific condensation method you used. For evaluation on different tasks, you can simply run:

python evaluator/test_other_tasks.py --method ${method} --dataset cora --gpu_id=0 --r=0.5 --seed=1 --nruns=5 --task=LP

Replace ${method} with the specific condensation method you used. The --task parameter can be set to LP for link prediction, AD for anomaly detection, etc.

Algorithm References

Summary of Graph Condensation (GC) algorithms. We also provide public access to the official algorithm implementations. "KRR" is short for Kernel Ridge Regression and "CTC" is short for computation tree compression. "GNN" is short for Graph Neural Network, "GNTK" is short for Graph Neural Tangent Kernel, "SD" is short for Spectral Decomposition. "NC" is short for node classification, "LP" is short for link prediction, "AD" is short for anomaly detection, and "GC" is short for graph classification.

MethodInitializationBackbone ModelDownstream TaskPaperCodeVenue
Random
HerdingHerding Dynamical Weights to LearncodeICML, 2009
K-CenterActive learning for convolutional neural networks: A core-set approachcodeICLR, 2018
GCondRandom SampleGNNNCGraph Condensation for Graph Neural NetworkscodeICLR, 2021
DosCondRandom SampleGNNNC, GCCondensing Graphs via One-Step Gradient MatchingcodeSIGKDD, 2022
SGDDRandom SampleGNNNC, LP, ADDoes Graph Distillation See Like Vision Dataset Counterpart?codeNeurIPS, 2023
GCDMRandom SampleGNNNCGraph Condensation via Receptive Field Distribution MatchingarXiv, 2022
DMRandom SampleGNNNCCaT: Balanced Continual Graph Learning with Graph CondensationICDM, 2023
SFGCK-CenterGNNNCStructure-free Graph Condensation: From Large-scale Graphs to Condensed Graph-free DatacodeNeurIPS, 2023
GEOMK-CenterGNNNCNavigating Complexity: Toward Lossless Graph Condensation via Expanding Window MatchingcodeICML, 2024
KiDDRandom SampleGNTKGCKernel Ridge Regression-Based Graph Dataset DistillationcodeSIGKDD, 2023
MirageGNNGCMirage: Model-Agnostic Graph Distillation for Graph ClassificationcodeICLR, 2024