Awesome

selcf_paper

This repository contains the Python code to reproduce the results in our paper Selection by Prediction with Conformal p-values.

The simulation in the paper was run with Python 3. The following Python packages are required to be installed: numpy, pandas, sklearn.

Folders

simulations/: bash file for running the simulations in batch.
utils/: Python codes for the simulations.
results/: store all the experiment outputs, will be automatically created if this directory does not exist.

Running simulations

Single run

Calling the file simu.py executes one run of the simulation. It takes five inputs: --sig from 1 to 10 corresponds to the noise strength $\sigma$ in the paper from 0.1 to 1), --nt_id from 1 to 4 corresponds to test sample sizes 10, 100, 500, 1000, --set_id from 1 to 8 corresponds to the eight data generating processes in the paper (Table 2), --q from 1, 2, 5 corresponds to FDR level 0.1, 0.2 and 0.5, --seed from 1 to 1000 is the random seed used in this run.

It iterates over all the three machine learning algorithm (gbr, rf and svm) in the paper and three nonconformity scores (BH_res, BH_rel, BH_clip) in one single run.

For example, to execute a single run of the experiment for noise strength 0.4, test sample size 100 in setting 7, with FDR level 0.1 and random seed 53, simple run the following script:

cd simulations 
python3 simu.py 4 2 7 1 53

Batch submission

The simulations can also be submitted in a batch mode on computing clusters, using the bash file in bash/ folder (may need modification according to the configurations of the computing clusters).

The curret bash file runs --sig from 1 to 10, --nt_id from 1 to 4, --q in {1,2,5}, and --seed from 1 to 100. To submit these jobs, direct to bash/ folder and run

sh bash.sh

These parameters can be edited.