Home

Awesome

Efficient Inference of Optimal Decision Trees

Paper link: http://florent.avellaneda.free.fr/dl/AAAI20.pdf

The source code is designed to be compiled and executed on GNU/Linux.

Dependencies

Example of installing dependencies for Ubuntu

The following instructions have been tested on Ubuntu 19.04

sudo apt install g++ cmake minisat2 xdot libzip-dev libboost-dev

Build

cmake .
make

Running

Infer a decision tree with the algorithm DT_depth for the dataset "mouse":

./InferDT -d data/mouse.csv infer

Infer a decision tree with the algorithm DT_size for the dataset "car":

./InferDT data/car.csv infer

Run a 10-cross-validation on the dataset "mouse" with the algorithm DT_size:

./InferDT data/mouse.csv bench

Infer a decision tree with the algorithm DT_size for the training set balance-scale.csv.train1 and evaluate the model with the testing set balance-scale.csv.test1:

./InferDT data/balance-scale.csv.train1.csv infer -t data/balance-scale.csv.test1.csv

Print a help message:

./InferDT --help

Benchmarks

We ran experiments on Ubuntu with Intel Core CPU i7-2600K @ 3.40 GHz.

Verwer and Zhang Datasets

The datasets we used are extracted from the paper of Verwer and Zhang and are available at (https://github.com/SiccoVerwer/binoct).

DatasetSBTime DT_depthAccuracy DT_depthkn for DT_sizeTime DT_sizeAccuracy DT_sizeAccuracy BinOCT*
iris15011418 ms92.9 %310.630 ms93.2 %98.4 %
monks11241724 ms90.3 %4.41780 ms95.5 %87.1 %
monks216917190 ms70.2 %5.847.89.1 sec74.0 %63.3 %
monks31221730 ms78.1 %4.823.4210 ms82.6 %93.5 %
wine1781276600 ms89.3 %37.81.2 sec92.0 %88.9 %
balance-scale6252050 sec93.0 %8268183 sec92.6 %77.5 %

S: Number of examples in the dataset

B: Number of Boolean features in the dataset

Time DT_depth: Time used by our algorithm with parameter "-d"

Accuracy DT_depth: Accuracy of our algorithm with parameter "-d"

k: Depth of inferred decision trees

n: Number of nodes in inferred decision trees

Time DT_size: Time used by our algorithm without parameter "-d"

Accuracy DT_size: Accuracy of our algorithm without parameter "-d"

Accuracy BinOCT: Accuracy of algorithm from https://github.com/SiccoVerwer/binoct

Mouse

We used dataset Mouse that the authors Bessiere, Hebrard and O'Sullivan shared with us. Each entry in rows DT_size and DT_depth corresponds to the average over 100 runs. The first columns correspond to the name of each algorithm used. The next three columns correspond to inferring a decision tree from the whole dataset. The last column corresponds the 10-fold cross-validations.

AlgorithmTimeknAccuracy
DT2 from paper Minimising Decision Tree Size as Combinatorial Optimisation577 sec41583.8 %
DT1 from paper Learning Optimal Decision Trees with SAT12.9 sec41583.8 %
DT_size: our algorithm without parameter "-d"70 ms41583.5 %
DT_depth: our algorithm with parameter "-d"20 ms43185.8 %

Other

In this section we perform x 10-cross-validations made randomly and record the average.

DatasetSBTime DT_depthAccuracy DT_depthkn for DT_sizeTime DT_sizeAccuracy DT_sizex
zoo10113647 ms91.7 %420200 ms91 %200
BodyMassIndex50017253 sec85 %6.61096.4 h85.4 %1
lungCancerDataset59723.8 ms89.6 %2.677.5 ms90.6 %200
iris11311430 ms93.6 %3.614120 ms94.0 %200
monks11241730 ms97.0 %4.915140 ms99.5 %200
monks216917330 ms88.0 %6649.3 sec88.7 %100
monks39217125 ms79.6 %5.835.52.5 sec81.5 %800
balance-scale62520200 sec71.8 %927611 h73.6 %1
wine17812764.5 sec91.5 %312.214.5 sec91.2 %200

S: Number of examples in the dataset

B: Number of Boolean features in the dataset

k: Average depth of inferred decision trees

n: Average number of nodes in inferred decision trees

x: Number of 10-cross-validation performed