Awesome
Machine Learning for Customs Fraud Detection
BACUDA (바꾸다, BAnd of CUstoms Data Analysts)
BACUDA (BAnd of CUstoms Data Analysts) is a collaborative research project of the World Customs Organization (WCO), Members and data scientists. It aims to develop data analytics algorithms in open-source languages so that all the Members can deploy them with their own data.
DATE (Dual Attentive Tree-aware Embedding for Customs Fraud Detection)
The DATE model is the first outcome of BACUDA project, which is to detect under-valued imports while maximizing Customs revenue. Our research paper will be published in the Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (2020). For more details, please refer to the following links;
- Promotional Video: https://www.youtube.com/watch?v=YhfxCHBNM2g&feature=youtu.be
- Github repository: https://github.com/Roytsai27/Dual-Attentive-Tree-aware-Embedding
- Our KDD Paper: https://github.com/Roytsai27/Dual-Attentive-Tree-aware-Embedding/blob/master/KDD2020/kdd2020-date-paper.pdf
- Authors: Kim, Sundong and Tsai, Yu-Che and Singh, Karandeep and Choi, Yeonsoo and Ibok, Etim and Li, Cheng-Te and Cha, Meeyoung
This repository: stepping stones toward DATE
This repository is dedicated to providing stepping stones toward DATE model for Customs administrations and officials, who want to develop their capacities to use machine learning in their daily works. This repository will provide prerequisite knowledge and practices for machine learning, so that Customs community could better understand cutting edge algorithms in DATE model.
Stepping stones (Jupyter Notebooks)
Please use the links below to access the notebooks in your web-browser. You can download the notebooks and synthetic data by clicking the top-right green button.
- 1_1_Synthetic import data
- 1_2_How to generate synthetic import data with CTGAN
- 2_Data preprocessing (Risk profiling)
- 3_Decision tree model
- 4_XGBoost model
- 5_Comparative analysis of multiple models
- 6_XGBoost + Logistic regression model
- 7_Deep learning (Neural Network) model with entity embedding
- 8_Dual task model
- 9_DATE model manual