Awesome

Graph Condensation Benchmark (GC-Bench)

GC-Bench is an open and unified benchmark for Graph Condensation (GC) based on PyTorch and PyTorch Geometric. We embark on 12 state-of-the-art graph condensation algorithms in node-level and graph-level tasks and analyze their performance in 12 distinct graph datasets.

Overview of GC-Bench

GC-Bench is a comprehensive Graph Condensation Benchmark designed to systematically analyze the performance of graph condensation methods in various scenarios. It examines the effectiveness, transferability, and complexity of graph condensation. We evaluate 12 state-of-the-art graph condensation algorithms on both node-level and graph-level tasks across 12 diverse graph datasets. Through benchmarking these GC algorithms, we make the following contributions:

Comprehensive Benchmark. GC-Bench systematically integrates 12 representative and competitive GC methods on both node-level and graph-level tasks by unified condensation and evaluation, providing a thorough analysis in terms of effectiveness, transferability, and efficiency.
Multi-faceted Evaluation and Analysis. We conduct a detailed evaluation of GC methods, examining their effectiveness, efficiency, and complexity. This comprehensive analysis uncovers the strengths and limitations of current GC algorithms, offering valuable insights for future research.
Open-sourced Benchmark Library. GC-Bench is open-sourced and easy to extend with new methods and datasets. This facilitates further exploration and encourages reproducible research, helping to advance the field of graph condensation.

Getting Started

To get started with GC-Bench, please follow the instructions below:

Installation

git clone https://github.com/RingBDStack/GC-Bench.git
cd GC-Bench
pip install -r requirements.txt
conda env create -f environment.yml

Download Datasets

Download the node classification and graph classification datasets and store them in the specified directory. By default, this is the data directory, but you can customize it by changing the data_dir parameter in your configuration. The project structure should look like the following:
```
GC-Bench
   ├── data
   │   ├── cora
   │   ├── citeseer
   │   └── ...
   └── DM
   └── ...
```
Alternatively, you can leverage PyG to download and manage these datasets directly, eliminating the need to manually place them in the data directory.

Condense Graph Datasets

Different graph condensation methods (gradient-matching, distribution-matching, kernel ridge regression etc) can be used in corresponding directories.

For example, to run the Distribution Matching (DM) method, use the following command:

python DM/main.py --dataset=citeseer --epochs=2000 --gpu_id=0 --lr_adj=0.001 --lr_feat=0.01 --lr_model=0.1 --method=GCDM --nlayers=2 --outer=10 --reduction_rate=1 --save=1 --seed=1 --transductive=1

To run the Gradient Matching (GM) method for node classification, use the following command:

python GM/main_nc.py --dataset cora --transductive=1 --nlayers=2 --sgc=1 --lr_feat=1e-4 --lr_adj=1e-4 --r=0.5 --seed=1 --epoch=600 --save=1

To run the Gradient Matching (GM) method for graph classification, use the following command:

python GM/main_gc.py --dataset ogbg-molhiv --init real --nconvs=3 --dis=mse --lr_adj=0.01 --lr_feat=0.01 --epochs=1000 --eval_init=1 --net_norm=none --pool=mean --seed=1 --ipc=5 --save=1

Parameters can also be set in configuration files. To run experiments using a configuration file, use the following command:

python GM/main_nc.py --config config_DosCond --section DBLP-r0.250

This command will run the corresponding experiments with the parameters specified in the configuration file. The provided configuration files contain the parameters used to obtain the results presented in our benchmark.

Evaluate condensed graphs

For evaluation on different architectures, you can simply run:

python baselines/test_nc.py --method ${method} --dataset cora --gpu_id=0 --r=0.5 --nruns=5

Replace ${method} with the specific condensation method you used. For evaluation on different tasks, you can simply run:

python evaluator/test_other_tasks.py --method ${method} --dataset cora --gpu_id=0 --r=0.5 --seed=1 --nruns=5 --task=LP

Replace ${method} with the specific condensation method you used. The --task parameter can be set to LP for link prediction, AD for anomaly detection, etc.

Algorithm References

Summary of Graph Condensation (GC) algorithms. We also provide public access to the official algorithm implementations. "KRR" is short for Kernel Ridge Regression and "CTC" is short for computation tree compression. "GNN" is short for Graph Neural Network, "GNTK" is short for Graph Neural Tangent Kernel, "SD" is short for Spectral Decomposition. "NC" is short for node classification, "LP" is short for link prediction, "AD" is short for anomaly detection, and "GC" is short for graph classification.

Method	Initialization	Backbone Model	Downstream Task	Paper	Code	Venue
Random	—	—	—	—	—
Herding	—	—	—	Herding Dynamical Weights to Learn	code	ICML, 2009
K-Center	—	—	—	Active learning for convolutional neural networks: A core-set approach	code	ICLR, 2018
GCond	Random Sample	GNN	NC	Graph Condensation for Graph Neural Networks	code	ICLR, 2021
DosCond	Random Sample	GNN	NC, GC	Condensing Graphs via One-Step Gradient Matching	code	SIGKDD, 2022
SGDD	Random Sample	GNN	NC, LP, AD	Does Graph Distillation See Like Vision Dataset Counterpart?	code	NeurIPS, 2023
GCDM	Random Sample	GNN	NC	Graph Condensation via Receptive Field Distribution Matching	—	arXiv, 2022
DM	Random Sample	GNN	NC	CaT: Balanced Continual Graph Learning with Graph Condensation	—	ICDM, 2023
SFGC	K-Center	GNN	NC	Structure-free Graph Condensation: From Large-scale Graphs to Condensed Graph-free Data	code	NeurIPS, 2023
GEOM	K-Center	GNN	NC	Navigating Complexity: Toward Lossless Graph Condensation via Expanding Window Matching	code	ICML, 2024
KiDD	Random Sample	GNTK	GC	Kernel Ridge Regression-Based Graph Dataset Distillation	code	SIGKDD, 2023
Mirage	—	GNN	GC	Mirage: Model-Agnostic Graph Distillation for Graph Classification	code	ICLR, 2024