Home

Awesome

Node Feature Kernels Increase Graph Convolutional Network Robustness

made-with-python License: MIT

This repository is the official implementation of Node Feature Kernels Increase Graph Convolutional Network Robustness.

<div align=center> <img src=https://github.com/ChangminWu/RobustGCN/blob/main/img/align.jpg width="50%"> </div>

It is mainly developed with help of the library Pytorch Geometric. We also thank Open Graph Benchmark implementation for providing an example of logger.py.

Requirements

A virtual environment can be created by conda with the given environments file,

conda env create -f environments.yml

Notice that Pytorch Geometric needs to be installed separately via Pip, as

conda activate RobustGCN
pip install -r requirements.txt

Usage

Run

This implementation is able to reproduce experiment results shown in our paper which studies the robustness of Graph Convolutional Networks (GCNs) under structural perturbation, including

To run experiments with Vanilla GCN, e.g. on Cora, do

python main.py --dataset Cora --out_dir <out dir> --num_layer 1 --readout mlp --exp_type robust --noise_type none --no-merged --epsilon 1.0

For random feature GCN, do

python main.py --dataset Cora --out_dir <out dir> --num_layer 1 --readout mlp --exp_type random-feature --noise_type none --epsilon 1.0 --hiddim 3000

For experiments in theoretical case (separate noise), do

python main.py --dataset Cora --out_dir <out dir> --num_layer 1 --readout mlp --exp_type robust --noise_type random --no-merged --noise_ratio 1.0 --epsilon 0.5 --identity

For experiments in realistic case (merged noise), do

python main.py --dataset Cora --out_dir <out dir> --num_layer 1 --readout mlp --exp_type robust --noise_type deletion --merged --noise_ratio 0.5 --epsilon 0.5 --add_kernel --add_identity --normalize --nystrom 

in the case of Edge Deletion. And do

python main.py --dataset Cora --out_dir <out dir> --num_layer 1 --readout mlp --exp_type robust --noise_type insertion --merged --noise_ratio 0.5 --epsilon 0.5 --add_kernel --add_identity --normalize --nystrom

in the case of Edge Insertion.

For deeper architecture, simply change the parameter of num_layer.

For experiments in node feature noise, e.g. realistic case, do

python main.py --dataset Cora --out_dir <out dir> --num_layers 1 --readout mlp --exp_type robust --noise_type insertion --merged --noise_ratio 0.5 --epsilon 0.5 --add_kernel --add_identity --normalize --nystrom --add_feat_noise --feat_noise_ratio 1.0

For experiments with multiple splits (see supplementary material Section E), e.g. realistic case, do

python main.py --dataset Cora --out_dir <out dir> --num_layer 1 --readout mlp --exp_type robust --noise_type insertion --merged --noise_ratio 0.5 --epsilon 0.5 --add_kernel --add_identity --normalize --nystrom --splits 10 --split_type random

Important Parameters

Description of important Model Options:
    --hiddim <int>
        dimension of the hidden representation of node embedding, default is 128
    --num_layer <int>
        number of stacked GCN layers, default is 1
    --readout <str>
        choice of readout functions that output prediction score, default is 'mlp'
    --exp_type <str>
        choice of different experiment settings, default is 'random-feature' 
    --noise_type <str>
        choice of different (noise) scenario, default is 'none' 
    --merged <bool>
        whether to merge noise into the original adjacency matrix, default is False
    --add_feat_noise <bool>
        whether to add gaussian noise on the features, default is False
    --add_kernel <bool>
        whether to enhance GCN message-passing with kernel, default is False
    --random_noise_type <str>
        choice of random graph generative model modelling the noise, default is Erdos-Renyi graph
    --kernel_type <str>
        choice of kernel function, default is 'linear'
    --noise_ratio <float>
        ratio between the random noise graph's density and the original graph's density, default is 1.0 
    --feat_noise_ratio <float>
        ratio between the standard deviation of the added gaussian noise and the original node features, default is 1.0 
    --standarize <bool>
        whether to standarize node features, default is False
    --centerize <bool>
        whether to centerize kernel values, default is False
    --add_identity <bool>
        whether to add self-loops to noise/kernel, default is False
    --normalize <bool>
        whether to degree normalize noise/kernel, default is False
    --rf_norm <bool>
        whether to normalize random weights in random feature GCN, default is False
    --split_type <str>
        choice of train/valid/test split of datasets, default is 'public'
    --nystrom <bool>
        whether to use nystrom approximation for computing kernel, default is False
    --epsilon <float>
        coefficient of the propagation following original graph in the GCN message-passing step, default is 1.0
    

Results

Table 1: Performance of GCN/GIN/GraphSage/GAT with node-feature kernel under perturbation on Cora

deletioninsertionGCNGCN-kGINGIN-kSAGESAGE-kGATGAT-k
0.00.076.42 ± 1.5575.42 ± 1.6576.94 ± 1.4177.62 ± 1.7474.77 ± 1.9876.00 ± 2.0576.55 ± 2.2377.45 ± 2.00
0.50.071.46 ± 1.6669.00 ± 2.9970.42 ± 2.0370.23 ± 1.7367.37 ± 1.7370.46 ± 1.8670.86 ± 1.4571.35 ± 1.90
0.01.060.73 ± 2.2070.55 ± 1.5263.87 ± 2.8567.80 ± 2.2766.53 ± 1.8068.52 ± 1.9759.25 ± 1.9964.92 ± 1.55
0.50.553.90 ± 1.8863.79 ± 2.2656.36 ± 2.2362.79 ± 1.5662.06 ± 1.7363.80 ± 2.5452.78 ± 2.3758.01 ± 1.96
0.51.045.04 ± 2.4662.08 ± 2.3049.56 ± 3.4055.24 ± 2.1359.54 ± 1.7562.15 ± 2.3243.97 ± 2.2952.47 ± 1.52

In the above table, each row corresponds to one perturbation scenario, where edges are randomly removed and/or added which is controlled by the two parameters: “deletion” and “insertion”, which correspond to the ratio of edges (w.r.t the original number of edges in the original graph) deleted/inserted from/to the graph. For example, the scenario (0.0, 0.0) corresponds to the unperturbed case and (0.5, 0.5) corresponds to the case where 50% of the original edges are removed and a same number of edges non-existing in the original graph are added.

Each column corresponds to a GNN model we considered. The appendage "-k" in the model name identifies that the model contains our proposed node-feature kernel. Each model is composed of a single message passing layer and a MLP readout layer. For all "-k" models, the coefficient of the perturbed graph propagation, i.e., $\beta$ in the paper, equals 0.5.

Contribution

Authors:

If you find our repo useful, please cite:

@misc{seddik2021node,
      title={Node Feature Kernels Increase Graph Convolutional Network Robustness}, 
      author={Mohamed El Amine Seddik and Changmin Wu and Johannes F. Lutzeyer and Michalis Vazirgiannis},
      year={2021},
      eprint={2109.01785},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}