Awesome

GPUTreeShap

GPUTreeShap is a cuda implementation of the TreeShap algorithm by Lundberg et al. [1] for Nvidia GPUs. It is a header only module designed to be included in decision tree libraries as a fast backend for model interpretability using SHAP values. GPUTreeShap also implements variants of TreeShap based on Taylor-Shapley interaction indices [2], and interventional probability instead of conditional probability [3].

See the associated publication here

@misc{mitchell2022gputreeshap,
      title={GPUTreeShap: Massively Parallel Exact Calculation of SHAP Scores for Tree Ensembles},
      author={Rory Mitchell and Eibe Frank and Geoffrey Holmes},
      year={2022},
      eprint={2010.13972},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

Using GPUTreeShap

GPUTreeShap is integrated with XGBoost 1.3 onwards, see here for details and here for a demo notebook.

GPUTreeShap is integrated with the python shap package.

GPUTreeShap is integrated with the cuml project.

For usage in C++, see the example directory.

Performance

Using the benchmark script benchmark/benchmark.py we run GPUTreeShap as a backend for xgboost and compare its performance against multithreaded CPU based implementation. Test models are generated on four different datasets at different sizes. The below comparison is run on an Nvidia DGX-1 system, comparing a single V100 to 2X 20-Core Intel Xeon E5-2698 CPUs (40 physical cores total).

model	trees	leaves	max_depth	average_depth
covtype-small	80	560	3	2.929
covtype-med	800	113533	8	7.696
covtype-large	8000	6702132	16	13.654
cal_housing-small	10	80	3	3.000
cal_housing-med	100	21641	8	7.861
cal_housing-large	1000	3370373	16	14.024
fashion_mnist-small	100	800	3	3.000
fashion_mnist-med	1000	144211	8	7.525
fashion_mnist-large	10000	2929303	16	11.437
adult-small	10	80	3	3.000
adult-med	100	13067	8	7.637
adult-large	1000	642883	16	13.202

model	test_rows	cpu_time(s)	cpu_std	gpu_time(s)	gpu_std	speedup
covtype-small	10000	0.03719	0.016989	0.01637	0.006701	2.2713
covtype-med	10000	8.24571	0.065573	0.45239	0.026825	18.2271
covtype-large	10000	930.22357	0.555459	50.88014	0.205488	18.2826
cal_housing-small	10000	0.00708	0.005291	0.00737	0.005849	0.9597
cal_housing-med	10000	1.27267	0.021711	0.08722	0.019198	14.5912
cal_housing-large	10000	315.20877	0.298429	16.91054	0.343210	18.6398
fashion_mnist-small	10000	0.35401	0.142973	0.16965	0.039150	2.0866
fashion_mnist-med	10000	15.10363	0.073838	1.13051	0.084911	13.3600
fashion_mnist-large	10000	621.13735	0.144418	47.53092	0.174141	13.0681
adult-small	10000	0.00667	0.003201	0.00620	0.005009	1.0765
adult-med	10000	1.13609	0.004031	0.07788	0.010203	14.5882
adult-large	10000	88.12258	0.198140	4.66934	0.004628	18.8726

Memory usage

GPUTreeShap uses very little working GPU memory, only allocating space proportional to the model size. An application is far more likely to be constrained by the size of the dataset.

References

[1] Lundberg, Scott M., Gabriel G. Erion, and Su-In Lee. "Consistent individualized feature attribution for tree ensembles." arXiv preprint arXiv:1802.03888 (2018).

[2] Sundararajan, Mukund, Kedar Dhamdhere, and Ashish Agarwal. "The Shapley Taylor Interaction Index." International Conference on Machine Learning. PMLR, 2020.

[3] https://hughchen.github.io/its_blog/index.html