Awesome
Environment Setup
- Install anaconda
-
export PROJECT_DIR=<ABSOLUTE path to the repository root> conda create -n OpenFE python=3.8.12 conda activate OpenFE conda env config vars set PYTHONPATH=${PYTHONPATH}:${PROJECT_DIR} conda env config vars set PROJECT_DIR=${PROJECT_DIR} conda env config vars set LD_LIBRARY_PATH=${CONDA_PREFIX}/lib:${LD_LIBRARY_PATH} conda deactivate conda activate OpenFE python -m pip install -r requirements.txt --no-deps
Data Download
-
Part 1: Kaggle data
-
Prepare data of IEEE
- Download link: IEEE-CIS Fraud Detection | Kaggle (There is a
Download All
button) - unzip and make sure there exists
./data/IEEE/train_identity.csv
./data/IEEE/train_transaction.csv
./data/IEEE/test_identity.csv
./data/IEEE/test_transaction.csv
./data/IEEE/sample_submission.csv
- Download link: IEEE-CIS Fraud Detection | Kaggle (There is a
-
Prepare data of BNP
- Download link: BNP Paribas Cardif Claims Management | Kaggle (There is a
Download All
button) - unzip and make sure there exists
./data/BNP/train.csv.zip
./data/BNP/test.csv.zip
./data/BNP/sample_submission.csv.zip
- Download link: BNP Paribas Cardif Claims Management | Kaggle (There is a
-
-
Part 2: other data
- Download link: https://www.dropbox.com/s/8tj5ln7wz1r9arc/data.zip?dl=1
- Unzip and move the files so that there exists
./data/{dataset}/*.npy
Experiment
- Part 1: Kaggle experiment (Table 5 in our paper)
- IEEE Experiment
- Make sure you are in the folder
run_IEEE
. bash IEEE.sh
- Output is the file
run_IEEE/results/sub_xgb_OpenFE_*_order.csv
. - Submit link: IEEE-CIS Fraud Detection | Kaggle
- Make sure you are in the folder
- BNP Experiment
- Make sure you are in the folder
run_BNP
. bash BNP.sh
- Outputs are in the folder
run_BNP/result/
. To evaluate them, submit them to the link below. - Submit link: BNP Paribas Cardif Claims Management | Kaggle
- Make sure you are in the folder
- IEEE Experiment
- Part 2: other experiments (Table 3 in our paper)
- Reproduce results of OpenFE
- Run a single dataset (e.g. california_housing)
bash shell_inst/california_housing.sh
- You can find results in OpenFE-california_housing.log
- You can also find results in the folder
runs/output/{dataset}/lightgbm/tuned
- There are two files in the folder.
result
shows the test value under corresponding metric.stats.json
shows more details
- There are two files in the folder.
- Run a single dataset (e.g. california_housing)
- Reproduce results of baseline methods
- We run SAFE on the Diabetes dataset as an example. Running other methods on other datasets only require changing the arguments.
python baseline/run_methods.py --method safe --data diabetes --task classification --n_new_features 10 --n_jobs 8
python eval.py --data diabetes --model lightgbm --model_type tuned --task_type classification --algorithm safe --n_saved_features 10
- Reproduce results of OpenFE
Acknowledgement
- rtdl: We use their codes for model training.
Structure
root:[demo]
+--data The folder of data.
| +--BNP
| ...
+--FeatureGenerator.py Imported by OpenFE for calculating features.
+--OpenFE.py This is a bit different from the open-sourced package.
+--readme.md Guide.
+--requirements.txt
+--runs This folder is for other experiment.
| +--bin
| +--clear.sh Remove all output files. (Including results.)
| +--eval.py Train models to evaluate new features.
| +--FE_first_order.py Generate first order features.
| +--FE_high_order.py Generate second order features.
| +--lib
| +--nn_utils.py
| +--run_all.py Automatically run all experiments.
| +--shell_inst Experiment for a specific dataset.
| | +--nomao.sh
| | ...
| +--tuned_parameters This folder contains the tuned parameters.
| +--tune_parameter.py
| +--baseline This folder contains all the baseline methods we reproduce.
+--run_BNP
| +--BNP.sh Automatically run BNP experiments.
| +--eval_first_order.py
| +--eval_high_order.py
| +--FE_first_order.py
| +--FE_high_order.py
| +--result
+--run_IEEE
| +--IEEE.sh Automatically run IEEE experiments.
| +--IEEE_utils.py
| +--main.py
| +--results
+--utils.py Utils imported by OpenFE.