Home

Awesome

GPUTreeShap

GPUTreeShap is a cuda implementation of the TreeShap algorithm by Lundberg et al. [1] for Nvidia GPUs. It is a header only module designed to be included in decision tree libraries as a fast backend for model interpretability using SHAP values. GPUTreeShap also implements variants of TreeShap based on Taylor-Shapley interaction indices [2], and interventional probability instead of conditional probability [3].

See the associated publication here

@misc{mitchell2022gputreeshap,
      title={GPUTreeShap: Massively Parallel Exact Calculation of SHAP Scores for Tree Ensembles},
      author={Rory Mitchell and Eibe Frank and Geoffrey Holmes},
      year={2022},
      eprint={2010.13972},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

Using GPUTreeShap

GPUTreeShap is integrated with XGBoost 1.3 onwards, see here for details and here for a demo notebook.

GPUTreeShap is integrated with the python shap package.

GPUTreeShap is integrated with the cuml project.

For usage in C++, see the example directory.

Performance

Using the benchmark script benchmark/benchmark.py we run GPUTreeShap as a backend for xgboost and compare its performance against multithreaded CPU based implementation. Test models are generated on four different datasets at different sizes. The below comparison is run on an Nvidia DGX-1 system, comparing a single V100 to 2X 20-Core Intel Xeon E5-2698 CPUs (40 physical cores total).

modeltreesleavesmax_depthaverage_depth
covtype-small8056032.929
covtype-med80011353387.696
covtype-large800067021321613.654
cal_housing-small108033.000
cal_housing-med1002164187.861
cal_housing-large100033703731614.024
fashion_mnist-small10080033.000
fashion_mnist-med100014421187.525
fashion_mnist-large1000029293031611.437
adult-small108033.000
adult-med1001306787.637
adult-large10006428831613.202
modeltest_rowscpu_time(s)cpu_stdgpu_time(s)gpu_stdspeedup
covtype-small100000.037190.0169890.016370.0067012.2713
covtype-med100008.245710.0655730.452390.02682518.2271
covtype-large10000930.223570.55545950.880140.20548818.2826
cal_housing-small100000.007080.0052910.007370.0058490.9597
cal_housing-med100001.272670.0217110.087220.01919814.5912
cal_housing-large10000315.208770.29842916.910540.34321018.6398
fashion_mnist-small100000.354010.1429730.169650.0391502.0866
fashion_mnist-med1000015.103630.0738381.130510.08491113.3600
fashion_mnist-large10000621.137350.14441847.530920.17414113.0681
adult-small100000.006670.0032010.006200.0050091.0765
adult-med100001.136090.0040310.077880.01020314.5882
adult-large1000088.122580.1981404.669340.00462818.8726

Memory usage

GPUTreeShap uses very little working GPU memory, only allocating space proportional to the model size. An application is far more likely to be constrained by the size of the dataset.

References

[1] Lundberg, Scott M., Gabriel G. Erion, and Su-In Lee. "Consistent individualized feature attribution for tree ensembles." arXiv preprint arXiv:1802.03888 (2018).

[2] Sundararajan, Mukund, Kedar Dhamdhere, and Ashish Agarwal. "The Shapley Taylor Interaction Index." International Conference on Machine Learning. PMLR, 2020.

[3] https://hughchen.github.io/its_blog/index.html