Awesome
GraphFC: Customs Fraud Detection with Label Scarcity
This repo contains the PyTorch implementation for "GraphFC: Customs Fraud Detection with Label Scarcity".
The paper along with performance analysis on three real customs datasets can found <a href="https://arxiv.org/abs/2305.11377v1">here</a>
Model Architecture of GraphFC
<img width="1025" alt="model architecture" src="https://user-images.githubusercontent.com/62580782/153579232-2ea4cac8-f17c-42ec-82bd-c68f304c0765.PNG">Model architecture of GraphFC. Cross features extracted from GBDT step act as node features in the transaction graph. In the pre-training stage, GraphFC learns the model weights and refine the transaction representations. Afterwards, the model is fine-tuned with labeled data with dual-task learning framework to predict the illicitness and the additional revenue.
How to train the model
The model code for GraphFC lies in graph_sage
directory.
Simply run graph_sage/train.py
and specify the dataset parameters could train the model and evaluate the performance.
Please refer to the scripts under the directory run_*Data.sh
for reproduce the results for individual country.
graph_sage
|-- dataset.py -> Preprocess for customs data
|-- models.py -> Main model modules
|-- parser.py -> training arguments
|-- pygData_util.py -> Data structure for graph data
|-- run_Mdata.sh
|-- run_Ndata.sh
|-- run_Tdata.sh
|-- train.py -> Train model
|-- utils.py
Arguments and Hyperparameters
# Dataset parameters
--data: Country name for building dataset ['synthetic', 'real-n', 'real-m', 'real-t']
--initial_inspection_rate: Initial inspection rate of labeled data
--train_from: Starting date of training data
--test_from: Starting date of testing data
--test_length: Number of days for testing data
# GraphFC Hyperparameters
--seed: Random seed
--epoch: number of epochs
--l2: l2 regularization
--dim: dimension for hidden layers
--lr: learning rate
--device: The device name for training, if train with cpu, please use:"cpu"
Data
You can experiment with GraphFC by downloading synthetic customs data from this repo.