Awesome
Robustness of Graph Neural Networks at Scale (NeurIPS 2021)
Update: The attacks GRBCD and PRBCD are now part of PyTorch Geometric
Here we provide the code and configuration for our NeurIPS 2021 paper "Robustness of Graph Neural Networks at Scale".
Other resources: Project page - Paper - Video (Slideslive)
Please cite our paper if you use the method in your own work:
@inproceedings{geisler2021_robustness_of_gnns_at_scale,
title = {Robustness of Graph Neural Networks at Scale},
author = {Geisler, Simon and Schmidt, Tobias and \c{S}irin, Hakan and Z\"ugner, Daniel and Bojchevski, Aleksandar and G\"unnemann, Stephan},
booktitle={Neural Information Processing Systems, {NeurIPS}},
year = {2021},
}
Structure
Besides the standard python artifacts we provide:
cache
: for the pretrained models / attacked adjacency matricesconfig
: the configuration files grouped by experimentsdata
: for storing the datasetsexperiments
: source code defining the types of experimentskernels
: the custom kernel packagenotebooks
: for (jupyter) notebooksoutput
: for dumping the results of manual experiments (see instructions below)rgnn_at_scale
: the source codetests
: unit tests for some important parts of the codescript_execute_experiment.py
: the main script to execute an experiment
Installation
Note: The setup is tested only for Linux 18.04 and will likely not work on other platforms.
For simplicity we recommend to install PyTorch with CUDA support a priori via anaconda:
conda install pytorch==1.8.1 torchvision torchaudio cudatoolkit=10.2 -c pytorch
We used Python 3.7.6 and CUDA 10.2. We provide custom CUDA kernels that are fairly simple implementations for a row-wise topk
and row-wise weighted median
on a sparse matrix.
Due to custom CUDA kernels, you must be able to compile via nvcc
. Conda handles the c++ compiler etc. You also must have installed the CUDA toolkit and should select the matching CUDA version for your environment. Note that PyTorch Geometric and PyTorch have some version-dependent restriction regarding the supported CUDA versions. See also Build PyTorch from source which captures the requirements for building custom extensions.
If you don't have access to a machine with a CUDA compatible GPU you can also use a CPU-only setup. However, note that the soft-median
defense is only implemented using Custom CUDA kernels, hence not supported in a CPU-only setup.
Install pytorch for your CPU-only setup via anaconda:
conda install pytorch==1.8.1 torchvision torchaudio cpuonly -c pytorch
Main Package
Thereafter we can install the actual module via (alternatively use python install .
):
pip install -r requirements.txt
pip install .
By default the requirements are installed with very restrictive versioning since we did not test any other configuration. If you have version conflicts, you can also build without version restrictions via omitting the command pip install -r requirements.txt
(not tested).
Prebuilt Kernels [skipp this for CPU-only setup]
You also need to fulfill the requirements for compiling a custom C++/CUDA extension for PyTorch - usually satisfied by default via the conda command above.
You can either build the kernels a priori with
pip install ./kernels
or PyTorch will try to compile the kernels at runtime.
Unit Tests
To (unit) test the robust mean functions, you can run (make sure pytest is on your path):
pytest tests
We also provide the requirements we used during development via:
pip install -r requirements-dev.txt
Minimum Working Example
As a minimum working example we provide a Quick Start jupyter notebook, which can be run in colab. Here, we train a Vanilla GCN
on the Cora
dataset and attack it with local and global PR-BCD
.
Further, the Figure - Which nodes get attacked.ipynb notebook shows the code used to analyze the learning curves and the distribution of attacked nodes (e.g. Fig. 2).
Training
Note: after open sourcing we will provide the full collection of pretrained models and in the case of transfer attacks we will also provide all perturbed adjacency matrices. For now we only include the pretrained models for Cora ML.
For the training and evaluation code we decided to provide Sacred experiments which make it very easy to run the same code from the command line or on your cluster.
To train or attack the models you can use the script_execute_experiment
script and simply specif the respective configuration (if the configuration specifies partition: gpu_large
you need at least 32 GB of GPU memory):
python script_execute_experiment.py --config-file 'config/train/cora_and_citeseer.yaml'
Alternatively, you can also execute the experiment directly passing the desired configuration:
python experiments/experiment_train.py with "dataset=cora_ml" "seed=0" "model_params={\"label\": \"Soft Median GDC (T=1.0)\", \"model\": \"RGNN\", \"do_cache_adj_prep\": True, \"n_filters\": 64, \"dropout\": 0.5, \"mean\": \"soft_median\", \"mean_kwargs\": {\"temperature\": 1.0}, \"svd_params\": None, \"jaccard_params\": None, \"gdc_params\": {\"alpha\": 0.15, \"k\": 64}}" "artifact_dir=cache" "binary_attr=False" "make_undirected=True"
By default all the results of the experiments will be logged into ./output
.
Evaluation
For evaluation, we use the locally stored models in the cache
folder (unless specified differently).
Similarly to training, we provide a script that runs the attacks for different seeds for all pretrained models. For all experiments, please check out the config
folder. Note: as this runs multiple seeds and budgets it will take several minutes to complete
Additionally, we provide an example for a local attack on Cora ML and using PR-BCD (single seed and one budget):
python script_execute_experiment.py --config-files 'config/attack_evasion_local_direct/EXAMPLE_cora_and_citeseer_localprbcd.yaml'
Perturbed Adjacency Matrices
We provide the perturbed adjacency matrices for a GCN as torch_sparse.SparseTensor
for cora_ml
, citeseer
and pubmed
.
Due to the storage requirements, we provide a list of added and removed edges for arxiv
and products
(see table below). To restore the edge index see the following example where pert_edge_index
is the edge index with applied perturbations:
import numpy as np
from ogb.nodeproppred import PygNodePropPredDataset
import torch
data = PygNodePropPredDataset(root='./datasets', name='ogbn-arxiv')[0]
edge_set = {(u.item(), v.item()) if u < v else (v.item(), u.item())
for u, v in data.edge_index.T}
pert = np.load('./ogbn_arxiv_prbcd_budget_0p1_seed_1.npz')
pert_removed_set = {(u, v) for u, v in pert['pert_removed'].T}
pert_added_set = {(u, v) for u, v in pert['pert_added'].T}
pert_edge_set = edge_set - pert_removed_set | pert_added_set
pert_edge_index = torch.tensor(list(pert_edge_set)).T
↓ Attack / Budget → | 0.01 | 0.05 | 0.1 |
---|---|---|---|
GR-BCD | seed=0 seed=1 seed=5 | seed=0 seed=1 seed=5 | seed=0 seed=1 seed=5 |
PR-BCD | seed=0 seed=1 seed=5 | seed=0 seed=1 seed=5 | seed=0 seed=1 seed=5 |