Awesome
Oblique Decision Trees from Derivatives of ReLU Networks (Locally Constant Networks)
This repository is for the paper
- "Oblique Decision Trees from Derivatives of ReLU Networks" by Guang-He Lee and Tommi S. Jaakkola in ICLR 2020.
- The old title for this paper is "Locally Constant Networks"
Package version
- linux (we only tested the codes on Ubuntu)
- python3.6
- pytorch1.1.0
- scikit-learn (for computing AUC)
Repo structure
- scripts/: scripts for reproducing the experiments
- data/: datasets used in the paper. Training/validation/testing splits are provided.
- log/: the logs of the training results will be stored here.
- checkpoint/: the checkpoint of the learned model will be stored here.
Main files
- run_net_training.py: main python file for running LCNs/ALCNs/LLNs
- run_elcn_training.py: main python file for running ELCN
Quick start and reproducing experiments
Example scripts are available in scripts/. For example,
- scripts/bace_split/, scripts/HIV_split/, and scripts/PDBbind/ contain the scripts for reproducing the Bace, HIV, and PDBbind experiments with the tuned parameters. For example, you may run
sh scripts/bace_split/run_LCN.sh
which will run the LCN model with the Bace dataset.
- scripts/tox21_split/ and scripts/sider_split/ contain the scripts for reproducing the Tox21 and Sider experiments with the parameter tuning procedure. Since the two datasets are multi-label, you have to specify the task number when you run the script. For example, you may run
sh scripts/sider_split/run_LCN.sh 1
which will run the LCN model with the the 1st label in the Sider dataset. The Tox21 contains 12 labels, and the Sider dataset contains 27 labels.
- The results will be stored in the log/ directory. The last 3 columns record the training, validation, and testing performance, respectively (from left to right).
How to set hyper-parameters
Switching among LCN, ALCN, and LLN
The default model in run_net_training.py is LCN.
- Switching from LCN to ALCN: setting
--anneal
toapprox
. - Switching from LCN to LLN: setting
--net_type
tolocally_linear
.
Some suggestions for hyper-parameter tuning
If you would like to apply the codes for other datasets, we suggest to tune the following hyper-parameters. You can see the complete list of hyper-parameters in arg_utils.py.
- Depth of the network (
--depth
). - We suggest to use DropConnect (set
--drop_type
tonode_dropconnect
) and mildly tune the dropping probability (e.g., try--p
in{0.25, 0.5, 0.75}
). - You can start trying the model by setting
--back_n
(the depth of the network g<sub>φ</sub>) to0
. If it doesn't work, please try to increase it. In our experiments, we found that we need to increase it for regression tasks, and we can simply keep it to0
for classification tasks. - You may want to tune the learning iterations (
--epochs
), learning rate (--lr
), and optimizer (--optimizer
) for your tasks. If you change the learning iterations (--epochs
), you probably should also change the--lr_step_size
and--gamma
(see their meaning in thehelp
descriptions in arg_utils.py). - You may enlarge the
--batch-size
to accelerate training.
ELCN
If you would like to try the ensemble version (ELCN), you can just specify the maximum ensemble number as the --ensemble_n
, and the codes will automatically stores all the logs for each ensemble iteration. For example, if you would like to tune the ensemble number in {1,2,4,8}
, you can just run --ensemble_n 8
once, and check the logs for the results with ensemble numbers {1,2,4,8}
.
You may also tune the shrinkage parameter. We suggest to try --shrinkage
in {1, 0.1}
.
Contact
Guang-He Lee (guanghe@csail.mit.edu)
Citation:
If you find this repo useful for your research, please cite the paper
@inproceedings{lee2020oblique,
title={Oblique Decision Trees from Derivatives of ReLU Networks},
author={Guang-He Lee and Tommi S. Jaakkola},
booktitle={International Conference on Learning Representations},
year={2020},
url={https://openreview.net/forum?id=Bke8UR4FPB}
}