Home

Awesome

Optimal Sparse Decision Trees (OSDT)

GitHub Repo stars Twitter Follow License arXiv

This accompanies the paper, "Optimal Sparse Decision Trees" by Xiyang Hu, Cynthia Rudin, and Margo Seltzer.

It appeared in the 2019 NeurIPS conference

Use OSDT

from osdt import OSDT

# initilize an OSDT object
model = OSDT()
# fit the model
model.fit(x_train, y_train)
# make prediction and get the prediction accuracy
prediction, accuracy = model.predict(x_test, y_test)
# make prediction only
prediction = model.predict(x_test)

Documentation

All code are in the ./src folder. The OSDT class is in the osdt.py file.


CLASS osdt.OSDT(lamb=0.1, prior_metric="curiosity", MAXDEPTH=float('Inf'), MAX_NLEAVES=float('Inf'), niter=float('Inf'),
                logon=False, support=True, incre_support=True, accu_support=True, equiv_points=True,
                lookahead=True, lenbound=True, R_c0=1, timelimit=float('Inf'), init_cart=True,
                saveTree=False, readTree=False)
<details><summary> <b>PARAMETERS</b>: </summary> <p> </p> </details>

fit(x, y)

    Fit the model with input data.

    PARAMETERS:

    RETURNS:

predict(x, y=None)

    Predict if a particular sample is an outlier or not.

    PARAMETERS:

    RETURNS:


Installation

git clone https://github.com/xiyanghu/OSDT.git
cd OSDT
conda env create -f environment.yml
conda activate osdt

Dependencies

<!--- 1. Install GMP * Run Command `sudo apt install libgmp3-dev`(Ubuntu) OR `brew install gmp`(MacOS) * If the command above does not work, try manual Installation: * Download `gmp-6.2.1.tar.xz` from [gmplib.org](https://gmplib.org/) * Run command `tar -jvxf gmp-6.2.1.tar.xz` * Run command `cd gmp-6.2.1` * Run command `./configure` * Run command `make` * Run command `make check` * Run command `sudo make install` 2. Install MPFR * Run command `sudo apt install libmpfr-dev`(Ubuntu) OR `brew install mpfr`(MacOS) 3. Install libmpc * Run command `sudo apt install libmpc-dev`(Ubuntu) OR `brew install libmpc`(MacOS) 4. Install gmpy2 * Run command `pip install gmpy2` -->

Datasets

See data/preprocessed/.

We used 7 datasets: Five of them are from the UCI Machine Learning Repository (tic-tac-toc, car evaluation, monk1, monk2, monk3). The other two datasets are the ProPublica recidivism data set and the Fair Isaac (FICO) credit risk datasets. We predict which individuals are arrested within two years of release ({N = 7,215}) on the recidivism data set and whether an individual will default on a loan for the FICO dataset.

Example test code

We provide our test code in test_accuracy.py.

Citing OSDT

OSDT paper is published in Neural Information Processing Systems (NeurIPS) 2019. If you use OSDT in a scientific publication, we would appreciate citations to the following paper:

@inproceedings{NEURIPS2019_ac52c626,
 author = {Hu, Xiyang and Rudin, Cynthia and Seltzer, Margo},
 booktitle = {Advances in Neural Information Processing Systems},
 editor = {H. Wallach and H. Larochelle and A. Beygelzimer and F. d\textquotesingle Alch\'{e}-Buc and E. Fox and R. Garnett},
 pages = {7265-7273},
 publisher = {Curran Associates, Inc.},
 title = {Optimal Sparse Decision Trees},
 url = {https://proceedings.neurips.cc/paper_files/paper/2019/file/ac52c626afc10d4075708ac4c778ddfc-Paper.pdf},
 volume = {32},
 year = {2019}
}

or:

Hu, X., Rudin, C., and Seltzer, M. (2019). Optimal sparse decision trees. In Advances in Neural Information Processing Systems, pp. 7265–7273.