Awesome
<p align="center"> <br> <a href="https://image.flaticon.com/icons/svg/1671/1671517.svg"> <img src="https://github.com/safe-graph/DGFraud-TF2/blob/main/logo.png" width="550"/> </a> <br> <p> <p align="center"> <a href="https://travis-ci.com/github/safe-graph/DGFraud-TF2"> <img alt="travis-ci" src="https://travis-ci.com/safe-graph/DGFraud-TF2.svg?token=wicswr4X2g4v8jddTpUv&branch=main"> </a> <a href="https://www.tensorflow.org/install"> <img alt="Tensorflow" src="https://img.shields.io/badge/tensorflow-2.X-orange"> </a> <a href="https://www.python.org/"> <img alt="Python" src="https://img.shields.io/badge/python-3.6%20%7C%203.7%20%7C%203.8%20%7C%203.9-blue"> </a> <a href="https://github.com/safe-graph/DGFraud-TF2/archive/main.zip"> <img alt="PRs" src="https://img.shields.io/badge/PRs-welcome-brightgreen.svg"> </a> <a href="https://github.com/safe-graph/DGFraud-TF2/pulls"> <img alt="GitHub release" src="https://img.shields.io/github/v/release/safe-graph/DGFraud-TF2?include_prereleases"> </a> </p> <h3 align="center"> <p>A Deep Graph-based Toolbox for Fraud Detection in TensorFlow 2.X </h3>Introduction | Useful Resources | Installation | Datasets | User Guide | Implemented Models | How to Contribute
Introduction
DGFraud-TF2 is a Graph Neural Network (GNN) based toolbox for fraud detection. It is the Tensorflow 2.X version of DGFraud, which is implemented using TF 1.X. It integrates the implementation & comparison of state-of-the-art GNN-based fraud detection models. The introduction of implemented models can be found here.
We welcome contributions to this repo like adding new fraud detectors and extending the features of the toolbox.
If you use the toolbox in your project, please cite the paper below and the algorithms you used:
CIKM'20 (PDF)
@inproceedings{dou2020enhancing,
title={Enhancing Graph Neural Network-based Fraud Detectors against Camouflaged Fraudsters},
author={Dou, Yingtong and Liu, Zhiwei and Sun, Li and Deng, Yutong and Peng, Hao and Yu, Philip S},
booktitle={Proceedings of the 29th ACM International Conference on Information and Knowledge Management (CIKM'20)},
year={2020}
}
Useful Resources
- PyGOD: A Python Library for Graph Outlier Detection (Anomaly Detection)
- UGFraud: An Unsupervised Graph-based Toolbox for Fraud Detection
- Graph-based Fraud Detection Paper List
- Awesome Fraud Detection Papers
- PyOD: A Python Toolbox for Scalable Outlier Detection (Anomaly Detection)
- PyODD: An End-to-end Outlier Detection System
- DGL: Deep Graph Library
- Realtime Fraud Detection with GNN on DGL
- Outlier Detection DataSets (ODDS)
Installation
git clone https://github.com/safe-graph/DGFraud-TF2.git
cd DGFraud-TF2
python setup.py install
Requirements
* python>=3.6
* tensorflow>=2.0
* numpy>=1.16.4
* scipy>=1.2.0
Datasets
DBLP
We uses the pre-processed DBLP dataset from Jhy1993/HAN You can run the FdGars, Player2Vec, GeniePath and GEM based on the DBLP dataset. Unzip the archive before using the dataset:
cd dataset
unzip DBLP4057_GAT_with_idx_tra200_val_800.zip
Example dataset
We implement example graphs for SemiGNN, GAS and GEM in data_loader.py
. Because those models require unique graph structures or node types, which cannot be found in opensource datasets.
Yelp dataset
For GraphConsis and GraphSAGE, we preprocessed Yelp Spam Review Dataset with reviews as nodes and three relations as edges.
The dataset with .mat
format is located at /dataset/YelpChi.zip
. The .mat
file includes:
net_rur, net_rtr, net_rsr
: three sparse matrices representing three homo-graphs defined in GraphConsis paper;features
: a sparse matrix of 32-dimension handcrafted features;label
: a numpy array with the ground truth of nodes.1
represents spam and0
represents benign.
The YelpChi data preprocessing details can be found in our CIKM'20 paper. To get the complete metadata of the Yelp dataset, please email to ytongdou@gmail.com for inquiry.
User Guide
Running the example code
You can find the implemented models in algorithms
directory. For example, you can run Player2Vec using:
python Player2Vec_main.py
You can specify parameters for models when running the code.
Running on your datasets
Have a look at the load_data_dblp() function in utils/utils.py for an example.
In order to use your own data, you have to provide:
- adjacency matrices or adjlists (for GAS);
- a feature matrix
- a label matrix then split feature matrix and label matrix into testing data and training data.
You can specify a dataset as follows:
python xx_main.py --dataset your_dataset
or by editing xx_main.py
The structure of code
The repository is organized as follows:
algorithms/
contains the implemented models and the corresponding example code;layers/
contains all GNN layers used by implemented models;dataset/
contains the necessary dataset files;utils/
contains:- loading and splitting the data (
data_loader.py
); - contains various utilities (
utils.py
).
- loading and splitting the data (
Implemented Models
Model Source
Model Comparison
Model | Application | Graph Type | Base Model |
---|---|---|---|
SemiGNN | Financial Fraud | Heterogeneous | GAT, LINE, DeepWalk |
Player2Vec | Cyber Criminal | Heterogeneous | GAT, GCN |
GAS | Opinion Fraud | Heterogeneous | GCN, GAT |
FdGars | Opinion Fraud | Homogeneous | GCN |
GeniePath | Financial Fraud | Homogeneous | GAT |
GEM | Financial Fraud | Heterogeneous | GCN |
GraphSAGE | Opinion Fraud | Homogeneous | GraphSAGE |
GraphConsis | Opinion Fraud | Heterogeneous | GraphSAGE |
HACUD | Financial Fraud | Heterogeneous | GAT |
How to Contribute
You are welcomed to contribute to this open-source toolbox. Currently, you can create PR or email to bdscsafegraph@gmail.com for inquiry.