Home

Awesome

Reliable Graph Neural Networks via Robust Aggregation

actions

This repository contains the official implementation of:

S. Geisler, D. Zügner, and S. Günnemann. Reliable Graph Neural Networks via Robust Aggregation. Neural Information Processing Systems, NeurIPS, 2020

See also: Project Page - Arxiv - Google Colab Notebook

Paper Summary

The main idea is to substitute the message passing aggregation in a Graph Neural Network (GNN)

Aggregation

with robust location estimators for improved robustness w.r.t. adversarial modifications of the graph structure.

In Figure 1 of our paper, we give an exemplary plot for Nettack that clearly shows that strong adversarially added edges are resulting in a concentrated region of outliers. This is exactly the case where robust aggregations are particularity strong. Figure 1

We show that in combination with personalized page rank (aka GDC - Graph Diffusion Convolution) our method (aka Soft Medoid GDC) outperforms all baselines and tested state of the art adversarial defenses: Figure 5

Consider citing our work via:

@inproceedings{geisler2020robustaggregation,
  title =      {Reliable Graph Neural Networks via Robust Aggregation},
  author =     {Simon Geisler, Daniel Z{\"{u}}gner and Stephan G{\"{u}}nnemann},
  booktitle =  {Neural Information Processing Systems, {NeurIPS}},
  year =       {2020},
}

TL;DR

Execute

conda install pytorch==1.6.0 torchvision torchaudio cudatoolkit=10.1 -c pytorch
pip install -r requirements.txt
pip install .

pip install ./kernels
conda install gmpy2 statsmodels
pip install ./sparse_smoothing

for setting the project up. Run for the results on empirical robustness (takes about 4 minutes with a GPU):

python script_evaluate_empirical_robustness.py

For the certified robustness via randomized smoothing use:

python script_evaluate_certified_robustness.py

Requirements

For simplicity we recommend to install PyTorch a priori via anaconda:

conda install pytorch==1.6.0 torchvision torchaudio cudatoolkit=10.1 -c pytorch

We used Python 3.7.6 and CUDA 10.1. We provide custom CUDA kernels that are fairly simple implementations for a row-wise topk and row-wise weighted median on a sparse matrix.

Due to custom CUDA kernels, you must be able to compile via nvcc. Conda handles the c++ compiler etc. You also must have installed the CUDA toolkit and should select the matching CUDA version for your environment. Note that PyTorch Geometric and PyTorch have some version-dependent restriction regarding the supported CUDA versions. See also Build PyTorch from source which captures the requirements for building custom extensions.

If you simply want to use the CPU, you can do not need to bothera bout CUDa and can go on with the installation. Later on you must add --kwargs '{"device": "cpu"}' while executing the script_*.py files.

Thereafter we can install the actual module via (alternatively use python install .):

pip install -r requirements.txt
pip install .

By default the requirements are installed with very restrictive versioning since we did not test any other configuration. If you have version conflicts, you can also build without version restrictions via omitting the command pip install -r requirements.txt (not tested).

Prebuilt Kernels

In case you want to use the GPU, you also need to fulfill the requirements for compiling a custom C++/CUDA extension for PyTorch - usually satisfied by default voa the conda command above.

You can either build the kernels a priori with

pip install ./kernels

or PyTorch will try to compile the kernels at runtime.

Sparse Smoothing

If you want to run the randomized smoothing experiments you need to install the respective module:

conda install gmpy2 statsmodels
pip install ./sparse_smoothing

In case the installation of gmpy fails please check out their installation guide.

Unit Tests

To (unit) test the robust mean functions, you can run (make sure pytest is on your path):

    pytest tests

We also provde the reqwuirements we used during development via:

pip install -r requirements-dev.txt

Training

Note: you can skip this section as we provide pretrained models

For the training and evaluation code we decided to provide SEML/Sacred experiments which make it very easy to run the same code from the command line or on your cluster.

The training for all the pretrained models is bundled in:

python script_train.py --kwargs '{"artifact_dir": "cache"}'

To train a model on cora_ml for evaluating the empirical robustness (from then on it will be also used for evaluation) e.g. run:

python experiment_train.py with "dataset=cora_ml" "seed=0" "model_params={\"label\": \"Soft Medoid GDC (T=1.0)\", \"model\": \"RGNN\", \"do_cache_adj_prep\": True, \"n_filters\": 64, \"dropout\": 0.5, \"mean\": \"soft_k_medoid\", \"mean_kwargs\": {\"k\": 64, \"temperature\": 1.0}, \"svd_params\": None, \"jaccard_params\": None, \"gdc_params\": {\"alpha\": 0.15, \"k\": 64}}" "artifact_dir=cache" "binary_attr=False"

With binary attributes (for randomized smoothing) use:

python experiment_train.py with "dataset=cora_ml" "seed=0" "model_params={\"label\": \"Soft Medoid GDC (T=1.0)\", \"model\": \"RGNN\", \"do_cache_adj_prep\": True, \"n_filters\": 64, \"dropout\": 0.5, \"mean\": \"soft_k_medoid\", \"mean_kwargs\": {\"k\": 64, \"temperature\": 1.0}, \"svd_params\": None, \"jaccard_params\": None, \"gdc_params\": {\"alpha\": 0.15, \"k\": 64}}" "artifact_dir=cache" "binary_attr=True"

Evaluation

For evaluation, we execute all locally stored (pretrained) models.

Empirical Robustness

Similarly to training, we provide a script that runs the attacks for different seeds for all pretrained models:

python script_evaluate_empirical_robustness.py

This will print the following table:

datasetlabelfgsm - 0.0fgsm - 0.1fgsm - 0.25pgd - 0.0pgd - 0.1pgd - 0.25
citeseerJaccard GCN0.714 ± 0.0130.659 ± 0.0100.600 ± 0.0120.714 ± 0.0130.658 ± 0.0120.588 ± 0.014
citeseerRGCN0.653 ± 0.0300.595 ± 0.0220.530 ± 0.0200.653 ± 0.0300.597 ± 0.0290.527 ± 0.027
citeseerSVD GCN0.650 ± 0.0130.624 ± 0.0140.563 ± 0.0140.650 ± 0.0130.618 ± 0.0130.547 ± 0.014
citeseerSoft Medoid GDC (T=0.2)0.705 ± 0.0150.676 ± 0.0170.650 ± 0.0200.705 ± 0.0150.677 ± 0.0150.654 ± 0.020
citeseerSoft Medoid GDC (T=0.5)0.711 ± 0.0090.674 ± 0.0120.629 ± 0.0140.711 ± 0.0090.673 ± 0.0140.634 ± 0.016
citeseerSoft Medoid GDC (T=1.0)0.716 ± 0.0070.661 ± 0.0100.606 ± 0.0110.716 ± 0.0070.658 ± 0.0100.601 ± 0.013
citeseerVanilla GCN0.712 ± 0.0110.647 ± 0.0080.567 ± 0.0120.712 ± 0.0110.639 ± 0.0080.560 ± 0.011
citeseerVanilla GDC0.709 ± 0.0100.634 ± 0.0070.556 ± 0.0100.709 ± 0.0100.625 ± 0.0070.549 ± 0.010
cora_mlJaccard GCN0.819 ± 0.0070.735 ± 0.0040.659 ± 0.0020.819 ± 0.0070.722 ± 0.0060.623 ± 0.002
cora_mlRGCN0.810 ± 0.0040.720 ± 0.0040.645 ± 0.0040.810 ± 0.0040.708 ± 0.0030.612 ± 0.003
cora_mlSVD GCN0.762 ± 0.0150.729 ± 0.0130.661 ± 0.0090.762 ± 0.0150.715 ± 0.0150.630 ± 0.016
cora_mlSoft Medoid GDC (T=0.2)0.801 ± 0.0020.746 ± 0.0020.697 ± 0.0010.801 ± 0.0020.753 ± 0.0010.717 ± 0.001
cora_mlSoft Medoid GDC (T=0.5)0.821 ± 0.0020.751 ± 0.0010.689 ± 0.0030.821 ± 0.0020.748 ± 0.0010.687 ± 0.002
cora_mlSoft Medoid GDC (T=1.0)0.829 ± 0.0020.744 ± 0.0020.681 ± 0.0020.829 ± 0.0020.738 ± 0.0020.662 ± 0.001
cora_mlVanilla GCN0.825 ± 0.0120.730 ± 0.0090.653 ± 0.0040.825 ± 0.0120.718 ± 0.0080.617 ± 0.004
cora_mlVanilla GDC0.833 ± 0.0010.728 ± 0.0050.653 ± 0.0040.833 ± 0.0010.715 ± 0.0040.622 ± 0.005

Certified Robustness

For Cora ML and Citeseer run

python script_evaluate_certified_robustness.py

This command results in:

datasetlabelAdd & del. edgesAdd edgesDel. edges
citeseerJaccard GCN1.477 ± 0.0940.134 ± 0.0123.936 ± 0.160
citeseerRGCN0.775 ± 0.0960.037 ± 0.0092.936 ± 0.231
citeseerSVD GCN0.556 ± 0.1100.001 ± 0.0012.546 ± 0.082
citeseerSoft Medoid GDC (T=0.2)4.809 ± 0.1840.580 ± 0.0254.429 ± 0.112
citeseerSoft Medoid GDC (T=0.5)3.750 ± 0.2110.437 ± 0.0364.276 ± 0.118
citeseerSoft Medoid GDC (T=1.0)2.694 ± 0.1920.275 ± 0.0264.170 ± 0.099
citeseerVanilla GCN1.281 ± 0.0970.113 ± 0.0093.927 ± 0.167
citeseerVanilla GDC1.152 ± 0.0920.076 ± 0.0063.901 ± 0.114
cora_mlJaccard GCN1.912 ± 0.0270.197 ± 0.0074.462 ± 0.021
cora_mlRGCN1.269 ± 0.0890.099 ± 0.0123.586 ± 0.161
cora_mlSVD GCN0.918 ± 0.0850.031 ± 0.0302.795 ± 0.062
cora_mlSoft Medoid GDC (T=0.2)5.977 ± 0.1020.677 ± 0.0114.795 ± 0.074
cora_mlSoft Medoid GDC (T=0.5)5.688 ± 0.0700.650 ± 0.0074.830 ± 0.046
cora_mlSoft Medoid GDC (T=1.0)4.947 ± 0.0670.393 ± 0.1804.857 ± 0.023
cora_mlVanilla GCN1.848 ± 0.0370.196 ± 0.0074.425 ± 0.020
cora_mlVanilla GDC2.003 ± 0.0170.164 ± 0.0054.457 ± 0.032

Note that we use a different setup than in the paper. For example, here we updated to PyTorch 1.6.0 and use the most recent sparse smoothing code. This is the main reason why the numbers are slightly different than in our paper.

Contributing

This code is licensed under MIT. In you want to contribute, feel free to open a pull request.