Home

Awesome

GraphSlim

Documentation Documentation | Benchmark Paper | Benchmark Scripts | Survey Paper | Paper Collection | Web Interface

Online Demo

Features

GraphSlim is a PyTorch library for graph reduction. It takes graph of PyG format as input and outputs a reduced graph preserving properties or performance of the original graph.

Guidance

Prepare Environments

CUDA and PyTorch

Check torch previous versions. We test this repo in torch 1.13.1 and torch 2.1.2 with CUDA 12.4.

Install from requirements

Please choose from requirements_torch1+.txt (for torch 1.\*) and requirements.txt (for torch2.*) at your convenience.

<!--# Download Datasets For cora, citeseer, flickr and reddit (reddit2 in pyg), the pyg code will directly download them. For arxiv, we use the datasets provided by [GraphSAINT](https://github.com/GraphSAINT/GraphSAINT). Our code will automatically download all datasets. The default path of datasets is `../../data`.-->

Install from pip

# choose one version from https://data.pyg.org/whl/ based on your environment
pip install torch_scatter torch_sparse -f https://data.pyg.org/whl/torch-${TORCH}+${CUDA}.html
pip install graphslim

Recommended way to download 'torch_sparse' and torch_scatter

It's usually faster and easy to download from .whl file. See details in install_torch_sparse.sh

Examples

python examples/train_coreset.py
python examples/train_coarsen.py
python examples/train_gcond.py

See more examples in Benchmark Scripts.

Use As Project

cd graphslim
python train_all.py -xxx xx

Run python configs.py --help to get all command line options.

Options:
  -D, --dataset TEXT              [default: cora]
  -G, --gpu_id INTEGER            gpu id start from 0, -1 means cpu  [default:
                                  0]
  --setting [trans|ind]           transductive or inductive setting
  --split TEXT                    only support public split now, do not change
                                  it  [default: fixed]
  --run_reduction INTEGER         repeat times of reduction  [default: 3]
  --run_eval INTEGER              repeat times of final evaluations  [default:
                                  10]
  --run_inter_eval INTEGER        repeat times of intermediate evaluations
                                  [default: 5]
  --eval_interval INTEGER         [default: 100]
  -H, --hidden INTEGER            [default: 256]
  --eval_epochs, --ee INTEGER     [default: 300]
  --eval_model, --em [GCN|GAT|SGC|APPNP|Cheby|GraphSage|GAT|SGFormer]
                                  [default: GCN]
  --condense_model [GCN|GAT|SGC|APPNP|Cheby|GraphSage|GAT]
                                  [default: SGC]
  -E, --epochs INTEGER            number of reduction epochs  [default: 1000]
  --lr FLOAT                      [default: 0.01]
  --weight_decay, --wd INTEGER    [default: 0]
  --pre_norm BOOLEAN              pre-normalize features, forced true for
                                  arxiv, flickr and reddit  [default: True]
  --outer_loop INTEGER            [default: 10]
  --inner_loop INTEGER            [default: 1]
  -R, --reduction_rate FLOAT      -1 means use representative reduction rate;
                                  reduction rate of training set, defined as
                                  (number of nodes in small graph)/(number of
                                  nodes in original graph)  [default: -1.0]
  -S, --seed INTEGER              Random seed  [default: 1]
  --nlayers INTEGER               number of GNN layers of condensed model
                                  [default: 2]
  -V, --verbose
  --init [variation_neighborhoods|variation_edges|variation_cliques|heavy_edge|algebraic_JC|affinity_GS|kron|vng|clustering|averaging|cent_d|cent_p|kcenter|herding|random]
                                  features initialization methods
  -M, --method [variation_neighborhoods|variation_edges|variation_cliques|heavy_edge|algebraic_JC|affinity_GS|kron|vng|clustering|averaging|gcond|doscond|gcondx|doscondx|sfgc|msgc|disco|sgdd|gcsntk|geom|cent_d|cent_p|kcenter|herding|random]
                                  [default: kcenter]
  --activation [sigmoid|tanh|relu|linear|softplus|leakyrelu|relu6|elu]
                                  activation function when do NAS  [default:
                                  relu]
  -A, --attack [random_adj|metattack|random_feat]
                                  corruption method
  -P, --ptb_r FLOAT               perturbation rate for corruptions  [default:
                                  0.25]
  --aggpreprocess                 use aggregation for coreset methods
  --dis_metric TEXT               distance metric for all condensation
                                  methods,ours means metric used in GCond
                                  paper  [default: ours]
  --lr_adj FLOAT                  [default: 0.0001]
  --lr_feat FLOAT                 [default: 0.0001]
  --threshold INTEGER             sparsificaiton threshold before evaluation
                                  [default: 0]
  --dropout FLOAT                 [default: 0.0]
  --ntrans INTEGER                number of transformations in SGC and APPNP
                                  [default: 1]
  --with_bn
  --no_buff                       skip the buffer generation and use existing
                                  in geom,sfgc
  --batch_adj INTEGER             batch size for msgc  [default: 1]
  --alpha FLOAT                   for appnp  [default: 0.1]
  --mx_size INTEGER               for gcsntk methods, avoid SVD error
                                  [default: 100]
  --save_path, --sp TEXT          save path for synthetic graph  [default:
                                  ../checkpoints]
  -W, --eval_whole                if run on whole graph
  --help                          Show this message and exit.

Use As Package

from graphslim.dataset import *
from graphslim.evaluation import *
from graphslim.condensation import GCond
from graphslim.config import cli

args = cli(standalone_mode=False)
# customize args here
args.reduction_rate = 0.5
args.device = 'cuda:0'
# add more args.<main_args/dataset_args> here
graph = get_dataset('cora', args=args)
# To reproduce the benchmark, use our args and graph class
# To use your own args and graph format, please ensure the args and graph class has the required attributes
# create an agent of one reduction algorithm
# add more args.<agent_args> here
agent = GCond(setting='trans', data=graph, args=args)
# reduce the graph 
reduced_graph = agent.reduce(graph, verbose=True)
# create an evaluator
# add more args.<evaluator_args> here
evaluator = Evaluator(args)
# evaluate the reduced graph on a GNN model
res_mean, res_std = evaluator.evaluate(reduced_graph, model_type='GCN')

All parameters can be divided into

<main_args>: dataset, method, setting, reduction_rate, seed, aggpreprocess, eval_whole, run_reduction
<attack_args>: attack, ptb_r
<dataset_args>: pre_norm, save_path, split, threshold
<agent_args>: init, eval_interval, eval_epochs, eval_model, condense_model, epochs, lr, weight_decay, outer_loop, inner_loop, nlayers, method, activation, dropout, ntrans, with_bn, no_buff, batch_adj, alpha, mx_size, dis_metric, lr_adj, lr_feat
<evaluator_args>: final_eval_model, eval_epochs, lr, weight_decay

See more details in Documentation

Customization

Web Interface

Our web application is deployed online using streamlit. But it also can be initiated using:

cd interface
python -m streamlit run vis_graphslim.py

to activate the interface. Please satisfy the dependency in interface/requirements.txt.

TODO

Limitations

Acknowledgement

Some of the algorithms are referred to paper authors' implementations and other packages.

SCAL

Sparsification

GCOND

GCSNTK

SFGC

GEOM

DeepRobust