Awesome

Blindfolded Attackers Still Threatening: Strict Black-Box Adversarial Attacks on Graphs

About

This project is the implementation of the paper "Blindfolded Attackers Still Threatening: Strict Black-Box Adversarial Attacks on Graphs". A strict black-box adversarial attack on graphs is proposed, where the attacker has no knowledge of the target model and no query access to the model. With the mere observation of the graph topology, the proposed attack strategy aim to flip a limited number of links to mislead the graph model.

This repo contains the codes, data and results reported in the paper.

Dependencies

The script has been tested running under Python 3.7.7, with the following packages installed (along with their dependencies):

numpy==1.18.1
scipy==1.4.1
scikit-learn==0.23.1
gensim==3.8.0
networkx==2.3
tqdm==4.46.1
torch==1.4.1
torch_geometric==1.5.0
- torch-spline-conv==1.2.0
- torch-scatter==2.0.4
- torch-sparse==0.6.0

Some Python module dependencies are listed in requirements.txt, which can be easily installed with pip:

pip install -r requirements.txt

In addition, CUDA 10.0 has been used in our project. Although not all dependencies are mentioned in the installation instruction links above, you can find most of the libraries in the package repository of a regular Linux distribution.

Usage: Node-level Attack

Given the adjacency matrix of input graph, our attacker aims to flip a limited number of links.

Input Format

Following our settings, we only need the structure information of input graphs to perform our attacks. An example data format is given in data where dataset is in npz format.

When using your own dataset, you must provide:

an N by N adjacency matrix (N is the number of nodes).

Output Format

The program outputs to a file in npz format which contains the adversarial edges.

Main Script

The help information of the main script node_level_attack.py is listed as follows:

python node_level_attack.py -h

usage: node_level_attack.py [-h][--dataset] [--pert-rate] [--threshold] [--save-dir]

optional arguments:
  -h, --help                Show this help message and exit
  --dataset                 str, The dataset to be perturbed on [cora, citeseer, polblogs].
  --pert-rate               float, Perturbation rate of edges to be flipped.
  --threshold               float, Restart threshold of eigen-solutions.
  --save-dir                str, File directory to save outputs.

Demo

We include all three benchmark datasets Cora-ML, Citeseer and Polblogs in the data directory. Then a demo script is available by calling attack.py, as the following:

python attack.py --data-name cora --pert-rate 0.1 --threshold 0.03

Evaluations

Our evaluations depend on the output adversarial edges by the above attack model. We provide the evaluation codes of our attack strategy on the node classification task here. We evaluate on three real-world datasets Cora-ML, Citeseer and Polblogs. Our setting is the poisoning attack, where the target models are retrained after perturbations. We use GCN, Node2vec and Label Propagation as the target models to attack.

Datasets

We evaluate on three real-world datasets Cora-ML, Citeseer and Polblogs. The preprocessed version is given in data directory where dataset is in npz format.

Evaluation Script

If you want to attack GCN, you can run evaluation/eval_gcn.py. The help information of the evaluation script is listed as follows:

python . -h

usage: . [-h][--dataset] [--pert-rate] [--dimensions] [--load-dir]

optional arguments:
  -h, --help                Show this help message and exit
  --dataset                 str, The dataset to be evluated on [cora, citeseer, polblogs].
  --pert-rate               float, Perturbation rate of edges to be flipped.
  --dimensions              str, Dimensions of GCN hidden layer. Default is 16.
  --load-dir                str, File directory to load adversarial edges.

If you want to attack Label Propagation, you can run evaluation/eval_emb.py. The help information of the evaluation script is listed as follows:

python . -h

usage: . [-h][--dataset] [--pert-rate] [--dimensions] [--window-size] [--load-dir]

optional arguments:
  -h, --help                Show this help message and exit
  --dataset                 str, The dataset to be evluated on [cora, citeseer, polblogs].
  --pert-rate               float, Perturbation rate of edges to be flipped.
  --dimensions              int, Output embedding dimensions of Node2vec. Default is 32.
  --window-size             int, Context size for optimization in Node2vec. Default is 5.
  --walk-length             int, Length of walk per source in Node2vec. Default is 80.
  --walk-num                int, Number of walks per source in Node2vec. Default is 10.
  --p                       float, Parameter in node2vec. Default is 4.0.
  --q                       float, Parameter in node2vec. Default is 1.0.
  --worker                  int, Number of parallel workers. Default is 10.
  --load-dir                str, File directory to load adversarial edges.

If you want to attack Node2vec, you can run evaluation/eval_lp.py. The help information of the evaluation script is listed as follows:

python . -h

usage: . [-h][--dataset] [--pert-rate] [--load-dir]

optional arguments:
  -h, --help                Show this help message and exit
  --dataset                 str, The dataset to be evluated on [cora, citeseer, polblogs].
  --pert-rate               float, Perturbation rate of edges to be flipped.
  --load-dir                str, File directory to load adversarial edges.

Usage: Graph-level Attack

Given a set of input graphs, our attacker aims to flip a limited number of links for each graph.

Input Format

When using your own dataset, you must provide:

the adjacency matrix of a set of graphs.

Main Script

The help information of the main script graph_level_attack.py is listed as follows:

python graph_level_attack.py -h

usage: graph_level_attack.py [-h][--dataset] [--pert-rate] [--threshold] [--model] [--epoch]

optional arguments:
  -h, --help                Show this help message and exit
  --dataset                 str, The dataset to be perturbed on [ENZYMES, PROTEINS].
  --pert-rate               float, Perturbation rate of edges to be flipped.
  --threshold               float, Restart threshold of eigen-solutions.
  --target-model            str, The target model to be attacked on [gin, diffpool].
  --epochs                  int, The number of epochs.

Demo

A demo script is available by calling graph_level_attack.py, as the following:

python graph_level_attack.py --data-name ENZYMES --pert-rate 0.2 --threshold 1e-5 --target-model diffpool --epochs 21

Evaluations

For graph-level attack, we perform our attack strategy to the graph classification task. We use GIN and Diffpool as our target models to attack. By running the script graph_level_attack.py, you can directly get the evaluation results.

Datasets

We evaluate on two protein datasets: Enzymes and Proteins. We call torch_geometric package to download and load these two datasets.