Home

Awesome

Multi-GNN

This repository contains all models and adaptations needed to run Multi-GNN for Anti-Money Laundering. The repository consists of four Graph Neural Network model classes (GIN, GAT, PNA, RGCN) and the below-described model adaptations utilized for financial crime detection in Egressy et al.. Note that this repository solely focuses on the Anti-Money Laundering use case. This repository has been created for experiments in Provably Powerful Graph Neural Networks for Directed Multigraphs [AAAI 2024] and Realistic Synthetic Financial Transactions for Anti-Money Laundering Models [NeurIPS 2023].

Setup

To use the repository, you first need to install the conda environment via

conda env create -f env.yml

Then, the data needed for the experiments can be found on Kaggle. To use this data with the provided training scripts, you first need to perform a pre-processing step for the downloaded transaction files (e.g. HI-Small_Trans.csv):

python format_kaggle_files.py /path/to/kaggle-files/HI-Small_Trans.csv

Make sure to change the filepaths in the data_config.json file. The aml_data path should be changed to wherever you stored the formatted_transactions.csv file generated by the pre-processing step.

Usage

To run the experiments you need to run the main.py function and specify any arguments you want to use. There are two required arguments, namely --data and --model. For the --data argument, make sure you store the different datasets in different folders. Then, specify the folder name, e.g --data Small_HI. The --model parameter should be set to any of the model classed that are available, i.e. to one of --model [gin, gat, rgcn, pna]. Thus, to run a standard GNN, you need to run, e.g.:

python main.py --data Small_HI --model gin

Then you can add different adaptations to the models by selecting the respective arguments from:

<div align="center">
ArgumentDescription
--emlpsEdge updates via MLPs
--reverse_mpReverse Message Passing
--egoEgo ID's to the center nodes
--portsPort Numberings for edges
</div> Thus, to run Multi-GIN with edge updates, you would run the following command:
python main.py --data Small_HI --model gin --emlps --reverse_mp --ego --ports

Additional functionalities

There are several arguments that can be set for additional functionality. Here's a list with them:

<div align="center">
ArgumentDescription
--tqdmDisplays a progress bar during training and inference.
--save_modelSaves the best model to the specified model_to_save path in the data_config.json file. Requires argment --unique_name to be specified.
--finetuneLoads a previously trained model (with name given by --unique_name and stored in model_to_load path in the data_config.json) to be finetuned.
--inferenceLoads a previously trained model (with name given by --unique_name and stored in model_to_load path in the data_config.json) to do inference only.
</div>

Licence

Apache License Version 2.0, January 2004