Home

Awesome

AntiFraud

PWC PWC PWC PWC

A Financial Fraud Detection Framework.

Source codes implementation of papers:

Usage

Data processing

  1. Run unzip /data/Amazon.zip and unzip /data/YelpChi.zip to unzip the datasets;
  2. Run python feature_engineering/data_process.py to pre-process all datasets needed in this repo.
  3. Run python feature_engineering/get_matrix.py to generate the adjacency matrix of the high-order transaction graph.Please note that this will require approximately 280GB of storage space. Please be aware that if you intend to run HOGRL , you should first execute the get_matrix.py script.

Training & Evalutaion

<!-- To use fraud detection baselines including GBDT, LSTM, etc., simply run ``` python main.py --method LSTM python main.py --method GBDT ``` You may change relevant configurations in `config/base_cfg.yaml`. -->

To test implementations of MCNN, STAN and STAGN, run

python main.py --method mcnn
python main.py --method stan
python main.py --method stagn

Configuration files can be found in config/mcnn_cfg.yaml, config/stan_cfg.yaml and config/stagn_cfg.yaml, respectively.

Models in GTAN and RGTAN can be run via:

python main.py --method gtan
python main.py --method rgtan

For specification of hyperparameters, please refer to config/gtan_cfg.yaml and config/rgtan_cfg.yaml.

Model in HOGRL can be run via:

python main.py --method hogrl

For specification of hyperparameters, please refer to config/hogrl_cfg.yaml.

Data Description

There are three datasets, YelpChi, Amazon and S-FFSD, utilized for model experiments in this repository.

<!-- YelpChi and Amazon can be downloaded from [here](https://github.com/YingtongDou/CARE-GNN/tree/master/data) or [dgl.data.FraudDataset](https://docs.dgl.ai/api/python/dgl.data.html#fraud-dataset). Put them in `/data` directory and run `unzip /data/Amazon.zip` and `unzip /data/YelpChi.zip` to unzip the datasets. -->

YelpChi and Amazon datasets are from CARE-GNN, whose original source data can be found in this repository.

S-FFSD is a simulated & small version of finacial fraud semi-supervised dataset. Description of S-FFSD are listed as follows:

NameTypeRangeNote
Timenp.int32from $\mathbf{0}$ to $\mathbf{N}$$\mathbf{N}$ denotes the number of trasactions.
Sourcestringfrom $\mathbf{S_0}$ to $\mathbf{S}_{ns}$$ns$ denotes the number of transaction senders.
Targetstringfrom $\mathbf{T_0}$ to $\mathbf{T}_{nt}$$nt$ denotes the number of transaction reveicers.
Amountnp.float32from 0.00 to np.infThe amount of each transaction.
Locationstringfrom $\mathbf{L_0}$ to $\mathbf{L}_{nl}$$nl$ denotes the number of transacation locations.
Typestringfrom $\mathbf{TP_0}$ to $\mathbf{TP}_{np}$$np$ denotes the number of different transaction types.
Labelsnp.int32from 0 to 22 denotes unlabeled

We are looking for interesting public datasets! If you have any suggestions, please let us know!

Test Result

The performance of five models tested on three datasets are listed as follows:

YelpChiAmazonS-FFSD
AUCF1APAUCF1APAUCF1AP
MCNN-----0.71290.68610.3309
STAN------0.74460.67910.3395
STAGN------0.76590.68520.3599
GTAN0.92410.79880.75130.96300.92130.88380.82860.73360.6585
RGTAN0.94980.84920.82410.97500.92000.89260.84610.75130.6939
HOGRL0.98080.8595-0.98000.9198----

MCNN, STAN and STAGN are presently not applicable to YelpChi and Amazon datasets.

HOGRL is presently not applicable to S-FFSD dataset.

Repo Structure

The repository is organized as follows:

Requirements

python           3.7
scikit-learn     1.0.2
pandas           1.3.5
numpy            1.21.6
networkx         2.6.3
scipy            1.7.3
torch            1.12.1+cu113
dgl-cu113        0.8.1

Contributors :

<a href="https://github.com/AI4Risk/antifraud/graphs/contributors"> <img src="https://contrib.rocks/image?repo=AI4Risk/antifraud" /> </a>

Citing

If you find Antifraud is useful for your research, please consider citing the following papers:

@inproceedings{zou2024effective,
  title={Effective High-order Graph Representation Learning for Credit Card Fraud Detection.},
  author={Zou, Yao and Cheng, Dawei},
  booktitle={International Joint Conference on Artificial Intelligence},
  year={2024}
}
@inproceedings{Xiang2023SemiSupervisedCC,
    title={Semi-supervised Credit Card Fraud Detection via Attribute-driven Graph Representation},
    author={Sheng Xiang and Mingzhi Zhu and Dawei Cheng and Enxia Li and Ruihui Zhao and Yi Ouyang and Ling Chen and Yefeng Zheng},
    booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
    year={2023}
}
@article{cheng2020graph,
    title={Graph Neural Network for Fraud Detection via Spatial-temporal Attention},
    author={Cheng, Dawei and Wang, Xiaoyang and Zhang, Ying and Zhang, Liqing},
    journal={IEEE Transactions on Knowledge and Data Engineering},
    year={2020},
    publisher={IEEE}
}
@inproceedings{cheng2020spatio,
    title={Spatio-temporal attention-based neural network for credit card fraud detection},
    author={Cheng, Dawei and Xiang, Sheng and Shang, Chencheng and Zhang, Yiyi and Yang, Fangzhou and Zhang, Liqing},
    booktitle={Proceedings of the AAAI Conference on Artificial Intelligence},
    volume={34},
    number={01},
    pages={362--369},
    year={2020}
}
@inproceedings{fu2016credit,
    title={Credit card fraud detection using convolutional neural networks},
    author={Fu, Kang and Cheng, Dawei and Tu, Yi and Zhang, Liqing},
    booktitle={International Conference on Neural Information Processing},
    pages={483--490},
    year={2016},
    organization={Springer}
}