Home

Awesome

FastTreeSHAP

PyPI Version Downloads

FastTreeSHAP package is built based on the paper Fast TreeSHAP: Accelerating SHAP Value Computation for Trees published in NeurIPS 2021 XAI4Debugging Workshop. It is a fast implementation of the TreeSHAP algorithm in the SHAP package.

For more detailed introduction of FastTreeSHAP package, please check out this blogpost.

Introduction

SHAP (SHapley Additive exPlanation) values are one of the leading tools for interpreting machine learning models. Even though computing SHAP values takes exponential time in general, TreeSHAP takes polynomial time on tree-based models (e.g., decision trees, random forest, gradient boosted trees). While the speedup is significant, TreeSHAP can still dominate the computation time of industry-level machine learning solutions on datasets with millions or more entries.

In FastTreeSHAP package we implement two new algorithms, FastTreeSHAP v1 and FastTreeSHAP v2, designed to improve the computational efficiency of TreeSHAP for large datasets. We empirically find that Fast TreeSHAP v1 is 1.5x faster than TreeSHAP while keeping the memory cost unchanged, and Fast TreeSHAP v2 is 2.5x faster than TreeSHAP, at the cost of a slightly higher memory usage (performance is measured on a single core).

The table below summarizes the time and space complexities of each variant of TreeSHAP algorithm (<img src="https://latex.codecogs.com/svg.latex?M"/> is the number of samples to be explained, <img src="https://latex.codecogs.com/svg.latex?N"/> is the number of features, <img src="https://latex.codecogs.com/svg.latex?T"/> is the number of trees, <img src="https://latex.codecogs.com/svg.latex?L"/> is the maximum number of leaves in any tree, and <img src="https://latex.codecogs.com/svg.latex?D"/> is the maximum depth of any tree). Note that the (theoretical) average running time of FastTreeSHAP v1 is reduced to 25% of TreeSHAP.

TreeSHAP VersionTime ComplexitySpace Complexity
TreeSHAP<img src="https://latex.codecogs.com/svg.latex?O(MTLD^2)"/><img src="https://latex.codecogs.com/svg.latex?O(D^2+N)"/>
FastTreeSHAP v1<img src="https://latex.codecogs.com/svg.latex?O(MTLD^2)"/><img src="https://latex.codecogs.com/svg.latex?O(D^2+N)"/>
FastTreeSHAP v2 (general case)<img src="https://latex.codecogs.com/svg.latex?O(TL2^DD+MTLD)"/><img src="https://latex.codecogs.com/svg.latex?O(L2^D)"/>
FastTreeSHAP v2 (balanced trees)<img src="https://latex.codecogs.com/svg.latex?O(TL^2D+MTLD)"/><img src="https://latex.codecogs.com/svg.latex?O(L^2)"/>

Performance with Parallel Computing

Parallel computing is fully enabled in FastTreeSHAP package. As a comparison, parallel computing is not enabled in SHAP package except for "shortcut" which calls TreeSHAP algorithms embedded in XGBoost, LightGBM, and CatBoost packages specifically for these three models.

The table below compares the execution times of FastTreeSHAP v1 and FastTreeSHAP v2 in FastTreeSHAP package against TreeSHAP algorithm (or "shortcut") in SHAP package on two datasets Adult (binary classification) and Superconductor (regression). All the evaluations were run in parallel on all available cores in Azure Virtual Machine with size Standard_D8_v3 (8 cores and 32GB memory) (except for scikit-learn models in SHAP package). We ran each evaluation on 10,000 samples, and the results were averaged over 3 runs.

Model# TreesTree<br>DepthDatasetSHAP (s)FastTree- <br>SHAP v1 (s)SpeedupFastTree- <br>SHAP v2 (s)Speedup
sklearn random forest5008Adult318.44*43.897.2627.0611.77
sklearn random forest5008Super466.0458.288.0036.5612.75
sklearn random forest50012Adult2446.12293.758.33158.9315.39
sklearn random forest50012Super5282.52585.859.02370.0914.27
XGBoost5008Adult17.35**12.311.416.532.66
XGBoost5008Super35.3121.091.6713.002.72
XGBoost50012Adult62.1940.311.5421.342.91
XGBoost50012Super152.2382.461.8551.472.96
LightGBM5008Adult7.64***7.201.063.242.36
LightGBM5008Super8.737.111.233.582.44
LightGBM50012Adult9.957.961.254.022.48
LightGBM50012Super14.0211.141.264.812.91

* Parallel computing is not enabled in SHAP package for scikit-learn models, thus TreeSHAP algorithm runs on a single core.
** SHAP package calls TreeSHAP algorithm in XGBoost package, which by default enables parallel computing on all cores.
*** SHAP package calls TreeSHAP algorithm in LightGBM package, which by default enables parallel computing on all cores.

Installation

FastTreeSHAP package is available on PyPI and can be installed with pip:

pip install fasttreeshap

Installation troubleshooting:

brew install libomp

Usage

The following screenshot shows a typical use case of FastTreeSHAP on Census Income Data. Note that the usage of FastTreeSHAP is exactly the same as the usage of SHAP, except for four additional arguments in the class TreeExplainer: algorithm, n_jobs, memory_tolerance, and shortcut.

algorithm: This argument specifies the TreeSHAP algorithm used to run FastTreeSHAP. It can take values "v0", "v1", "v2" or "auto", and its default value is "auto":

n_jobs: This argument specifies the number of parallel threads used to run FastTreeSHAP. It can take values -1 or a positive integer. Its default value is -1, which means utilizing all available cores in parallel computing.

memory_tolerance: This argument specifies the upper limit of memory allocation (in GB) to run FastTreeSHAP v2. It can take values -1 or a positive number. Its default value is -1, which means allocating a maximum of 0.25 * total memory of the machine to run FastTreeSHAP v2.

shortcut: This argument determines whether to use the TreeSHAP algorithm embedded in XGBoost, LightGBM, and CatBoost packages directly when computing SHAP values for XGBoost, LightGBM, and CatBoost models and when computing SHAP interaction values for XGBoost models. Its default value is False, which means bypassing the "shortcut" and using the code in FastTreeSHAP package directly to compute SHAP values for XGBoost, LightGBM, and CatBoost models. Note that currently shortcut is automaticaly set to be True for CatBoost model, as we are working on CatBoost component in FastTreeSHAP package. More details of the usage of "shortcut" can be found in the notebooks Census Income, Superconductor, and Crop Mapping.

FastTreeSHAP Adult Screenshot1

The code in the following screenshot was run on all available cores in a Macbook Pro (2.4 GHz 8-Core Intel Core i9 and 32GB Memory). We see that both "v1" and "v2" produce exactly the same SHAP value results as "v0". Meanwhile, "v2" has the shortest execution time, followed by "v1", and then "v0". "auto" selects "v2" as the most appropriate algorithm in this use case as desired. For more detailed comparisons between FastTreeSHAP v1, FastTreeSHAP v2 and the original TreeSHAP, check the notebooks Census Income, Superconductor, and Crop Mapping.

FastTreeSHAP Adult Screenshot2

Notes

Notebooks

The notebooks below contain more detailed comparisons between FastTreeSHAP v1, FastTreeSHAP v2 and the original TreeSHAP in classification and regression problems using scikit-learn, XGBoost and LightGBM:

Citation

Please cite FastTreeSHAP in your publications if it helps your research:

@article{yang2021fast,
  title={Fast TreeSHAP: Accelerating SHAP Value Computation for Trees},
  author={Yang, Jilei},
  journal={arXiv preprint arXiv:2109.09847},
  year={2021}
}

License

Copyright (c) LinkedIn Corporation. All rights reserved. Licensed under the BSD 2-Clause License.