Awesome
<div align="center"> <img src="https://github.com/SelfExplainML/PiML-Toolbox/blob/main/examples/results/LogoPiML.png" alt="drawing" width="314.15926"/>An integrated Python toolbox for interpretable machine learning
pip install PiML
🎄 Dec 1, 2023: V0.6.0 is released with enhanced data handling and model analytics.
:rocket: May 4, 2023: V0.5.0 is released together with PiML user guide.
:rocket: October 31, 2022: V0.4.0 is released with enriched models and enhanced diagnostics.
:rocket: July 26, 2022: V0.3.0 is released with classic statistical models.
:rocket: June 26, 2022: V0.2.0 is released with high-code APIs.
:loudspeaker: May 4, 2022: V0.1.0 is launched with low-code UI/UX.
</div>PiML (or π-ML, /ˈpaɪ·ˈem·ˈel/) is a new Python toolbox for interpretable machine learning model development and validation. Through low-code interface and high-code APIs, PiML supports a growing list of inherently interpretable ML models:
- GLM: Linear/Logistic Regression with L1 ∨ L2 Regularization
- GAM: Generalized Additive Models using B-splines
- Tree: Decision Tree for Classification and Regression
- FIGS: Fast Interpretable Greedy-Tree Sums (Tan, et al. 2022)
- XGB1: Extreme Gradient Boosted Trees of Depth 1, with optimal binning (Chen and Guestrin, 2016; Navas-Palencia, 2020)
- XGB2: Extreme Gradient Boosted Trees of Depth 2, with effect purification (Chen and Guestrin, 2016; Lengerich, et al. 2020)
- EBM: Explainable Boosting Machine (Nori, et al. 2019; Lou, et al. 2013)
- GAMI-Net: Generalized Additive Model with Structured Interactions (Yang, Zhang and Sudjianto, 2021)
- ReLU-DNN: Deep ReLU Networks using Aletheia Unwrapper and Sparsification (Sudjianto, et al. 2020)
PiML also works for arbitrary supervised ML models under regression and binary classification settings. It supports a whole spectrum of outcome testing, including but not limited to the following:
- Accuracy: popular metrics like MSE, MAE for regression tasks and ACC, AUC, Recall, Precision, F1-score for binary classification tasks.
- Explainability: post-hoc global explainers (PFI, PDP, ALE) and local explainers (LIME, SHAP).
- Fairness: disparity test and segmented analysis by integrating the solas-ai package.
- WeakSpot: identification of weak regions with high residuals by slicing techniques.
- Overfit: identification of overfitting regions according to train-test performance gap.
- Reliability: assessment of prediction uncertainty by split conformal prediction techniques.
- Robustness: evaluation of performance degradation under covariate noise perturbation.
- Resilience: evaluation of performance degradation under different out-of-distribution scenarios.
Installation | Examples | Usage | Citations
Installation<a name="Install"></a>
pip install PiML
Low-code Examples<a name="Example"></a>
Click the ipynb links to run examples in Google Colab:
- BikeSharing data: <a style="text-laign: 'center'" target="_blank" href="https://colab.research.google.com/github/SelfExplainML/PiML-Toolbox/blob/main/examples/Example_BikeSharing.ipynb"><img src="https://github.com/SelfExplainML/PiML-Toolbox/blob/main/examples/results/LogoColab.png" width="20"> ipynb</a>
- CaliforniaHousing data: <a style="text-laign: 'center'" target="_blank" href="https://colab.research.google.com/github/SelfExplainML/PiML-Toolbox/blob/main/examples/Example_CaliforniaHousing.ipynb"><img src="https://github.com/SelfExplainML/PiML-Toolbox/blob/main/examples/results/LogoColab.png" width="20"> ipynb</a>
- TaiwanCredit data: <a style="text-laign: 'center'" target="_blank" href="https://colab.research.google.com/github/SelfExplainML/PiML-Toolbox/blob/main/examples/Example_TaiwanCredit.ipynb"><img src="https://github.com/SelfExplainML/PiML-Toolbox/blob/main/examples/results/LogoColab.png" width="20"> ipynb</a>
- Fairness_SimuStudy1 data: <a style="text-laign: 'center'" target="_blank" href="https://colab.research.google.com/github/SelfExplainML/PiML-Toolbox/blob/main/examples/Example_Fairness_SimuStudy1.ipynb"><img src="https://github.com/SelfExplainML/PiML-Toolbox/blob/main/examples/results/LogoColab.png" width="20"> ipynb</a>
- Fairness_SimuStudy2 data: <a style="text-laign: 'center'" target="_blank" href="https://colab.research.google.com/github/SelfExplainML/PiML-Toolbox/blob/main/examples/Example_Fairness_SimuStudy2.ipynb"><img src="https://github.com/SelfExplainML/PiML-Toolbox/blob/main/examples/results/LogoColab.png" width="20"> ipynb</a>
- Upload custom data in two ways: <a style="text-laign: 'center'" target="_blank" href="https://colab.research.google.com/github/SelfExplainML/PiML-Toolbox/blob/main/examples/Example_CustomDataLoad_Two_Ways.ipynb"><img src="https://github.com/SelfExplainML/PiML-Toolbox/blob/main/examples/results/LogoColab.png" width="20"> ipynb</a>
- Deal with external models: <a style="text-laign: 'center'" target="_blank" href="https://colab.research.google.com/github/SelfExplainML/PiML-Toolbox/blob/main/examples/Example_ExternalModels.ipynb"><img src="https://github.com/SelfExplainML/PiML-Toolbox/blob/main/examples/results/LogoColab.png" width="20"> ipynb</a>
Begin your own PiML journey with <a style="text-laign: 'center'" target="_blank" href="https://colab.research.google.com/github/SelfExplainML/PiML-Toolbox/blob/main/PiML%20Low-code%20Example%20Run.ipynb">this demo notebook</a>.
High-code Examples<a name="Example"></a>
The same examples can also be run by high-code APIs:
- BikeSharing data: <a style="text-laign: 'center'" target="_blank" href="https://colab.research.google.com/github/SelfExplainML/PiML-Toolbox/blob/main/examples/Example_BikeSharing_HighCode.ipynb"><img src="https://github.com/SelfExplainML/PiML-Toolbox/blob/main/examples/results/LogoColab.png" width="20"> ipynb</a>
- CaliforniaHousing data: <a style="text-laign: 'center'" target="_blank" href="https://colab.research.google.com/github/SelfExplainML/PiML-Toolbox/blob/main/examples/Example_CaliforniaHousing_HighCode.ipynb"><img src="https://github.com/SelfExplainML/PiML-Toolbox/blob/main/examples/results/LogoColab.png" width="20"> ipynb</a>
- TaiwanCredit data: <a style="text-laign: 'center'" target="_blank" href="https://colab.research.google.com/github/SelfExplainML/PiML-Toolbox/blob/main/examples/Example_TaiwanCredit_HighCode.ipynb"><img src="https://github.com/SelfExplainML/PiML-Toolbox/blob/main/examples/results/LogoColab.png" width="20"> ipynb</a>
- Model saving: <a style="text-laign: 'center'" target="_blank" href="https://colab.research.google.com/github/SelfExplainML/PiML-Toolbox/blob/main/examples/Example_ModelSaving.ipynb"><img src="https://github.com/SelfExplainML/PiML-Toolbox/blob/main/examples/results/LogoColab.png" width="20"> ipynb</a>
- Results return: <a style="text-laign: 'center'" target="_blank" href="https://colab.research.google.com/github/SelfExplainML/PiML-Toolbox/blob/main/examples/Example_ResultsReturn.ipynb"><img src="https://github.com/SelfExplainML/PiML-Toolbox/blob/main/examples/results/LogoColab.png" width="20"> ipynb</a>
Low-code Usage on Google Colab<a name="Usage"></a>
Stage 1: Initialize an experiment, Load and Prepare data
from piml import Experiment
exp = Experiment()
exp.data_loader()
<img src="https://github.com/SelfExplainML/PiML-Toolbox/blob/main/examples/results/data_loader.png">
exp.data_summary()
<img src="https://github.com/SelfExplainML/PiML-Toolbox/blob/main/examples/results/data_summary.png">
exp.data_prepare()
<img src="https://github.com/SelfExplainML/PiML-Toolbox/blob/main/examples/results/data_prepare.png">
exp.data_quality()
<img src="https://github.com/SelfExplainML/PiML-Toolbox/blob/main/examples/results/data_quality.png">
exp.feature_select()
<img src="https://github.com/SelfExplainML/PiML-Toolbox/blob/main/examples/results/feature_select.png">
exp.eda()
<img src="https://github.com/SelfExplainML/PiML-Toolbox/blob/main/examples/results/data_eda.png">
Stage 2: Train intepretable models
exp.model_train()
<img src="https://github.com/SelfExplainML/PiML-Toolbox/blob/main/examples/results/model_train.png">
Stage 3. Explain and Interpret
exp.model_explain()
<img src="https://github.com/SelfExplainML/PiML-Toolbox/blob/main/examples/results/model_explain.png">
exp.model_interpret()
<img src="https://github.com/SelfExplainML/PiML-Toolbox/blob/main/examples/results/model_interpret.png">
Stage 4. Diagnose and Compare
exp.model_diagnose()
<img src="https://github.com/SelfExplainML/PiML-Toolbox/blob/main/examples/results/model_diagnose.png">
exp.model_compare()
<img src="https://github.com/SelfExplainML/PiML-Toolbox/blob/main/examples/results/model_compare.png">
exp.model_fairness()
<img src="https://github.com/SelfExplainML/PiML-Toolbox/blob/main/examples/results/model_fairness.png">
exp.model_fairness_compare()
<img src="https://github.com/SelfExplainML/PiML-Toolbox/blob/main/examples/results/model_fairness_compare.png">
Arbitrary Black-Box Modeling
For example, train a complex LightGBM with depth 7 and register it to the experiment:
from lightgbm import LGBMClassifier
exp.model_train(LGBMClassifier(max_depth=7), name='LGBM-7')
Then, compare it to inherently interpretable models (e.g. XGB2 and GAMI-Net):
exp.model_compare()
<img src="https://github.com/SelfExplainML/PiML-Toolbox/blob/main/examples/results/model_compare_2.png">
Citations<a name="Cite"></a>
<details open> <summary><strong>PiML, ReLU-DNN Aletheia and GAMI-Net</strong></summary><hr/>"PiML Toolbox for Interpretable Machine Learning Model Development and Diagnostics" (A. Sudjianto, A. Zhang, Z. Yang, Y. Su and N. Zeng, 2023) <a href="https://arxiv.org/abs/2305.04214">arXiv link</a>
@article{sudjianto2023piml,
title={PiML Toolbox for Interpretable Machine Learning Model Development and Diagnostics},
author={Sudjianto, Agus and Zhang, Aijun and Yang, Zebin and Su, Yu and Zeng, Ningzhou},
year={2023}
}
"Designing Inherently Interpretable Machine Learning Models" (A. Sudjianto and A. Zhang, 2021) <a href="https://arxiv.org/abs/2111.01743">arXiv link</a>
@article{sudjianto2021designing,
title={Designing Inherently Interpretable Machine Learning Models},
author={Sudjianto, Agus and Zhang, Aijun},
journal={arXiv preprint:2111.01743},
year={2021}
}
"Unwrapping The Black Box of Deep ReLU Networks: Interpretability, Diagnostics, and Simplification" (A. Sudjianto, W. Knauth, R. Singh, Z. Yang and A. Zhang, 2020) <a href="https://arxiv.org/abs/2011.04041">arXiv link</a>
@article{sudjianto2020unwrapping,
title={Unwrapping the black box of deep ReLU networks: interpretability, diagnostics, and simplification},
author={Sudjianto, Agus and Knauth, William and Singh, Rahul and Yang, Zebin and Zhang, Aijun},
journal={arXiv preprint:2011.04041},
year={2020}
}
"GAMI-Net: An Explainable Neural Network based on Generalized Additive Models with Structured Interactions" (Z. Yang, A. Zhang, and A. Sudjianto, 2021) <a href="https://arxiv.org/abs/2003.07132">arXiv link</a>
@article{yang2021gami,
title={GAMI-Net: An explainable neural network based on generalized additive models with structured interactions},
author={Yang, Zebin and Zhang, Aijun and Sudjianto, Agus},
journal={Pattern Recognition},
volume={120},
pages={108192},
year={2021}
}
</details>
<details open>
<summary><strong>Other Interpretable ML Models</strong></summary><hr/>
"Fast Interpretable Greedy-Tree Sums (FIGS)" (Tan, Y.S., Singh, C., Nasseri, K., Agarwal, A. and Yu, B., 2022)
@article{tan2022fast,
title={Fast interpretable greedy-tree sums (FIGS)},
author={Tan, Yan Shuo and Singh, Chandan and Nasseri, Keyan and Agarwal, Abhineet and Yu, Bin},
journal={arXiv preprint arXiv:2201.11931},
year={2022}
}
"Accurate intelligible models with pairwise interactions" (Y. Lou, R. Caruana, J. Gehrke, and G. Hooker, 2013)
@inproceedings{lou2013accurate,
title={Accurate intelligible models with pairwise interactions},
author={Lou, Yin and Caruana, Rich and Gehrke, Johannes and Hooker, Giles},
booktitle={Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining},
pages={623--631},
year={2013},
organization={ACM}
}
"Purifying Interaction Effects with the Functional ANOVA: An Efficient Algorithm for Recovering Identifiable Additive Models" (Lengerich, B., Tan, S., Chang, C.H., Hooker, G. and Caruana, R., 2020)
@inproceedings{lengerich2020purifying,
title={Purifying interaction effects with the functional anova: An efficient algorithm for recovering identifiable additive models},
author={Lengerich, Benjamin and Tan, Sarah and Chang, Chun-Hao and Hooker, Giles and Caruana, Rich},
booktitle={International Conference on Artificial Intelligence and Statistics},
pages={2402--2412},
year={2020},
organization={PMLR}
}
"InterpretML: A Unified Framework for Machine Learning Interpretability" (H. Nori, S. Jenkins, P. Koch, and R. Caruana, 2019)
@article{nori2019interpretml,
title={InterpretML: A Unified Framework for Machine Learning Interpretability},
author={Nori, Harsha and Jenkins, Samuel and Koch, Paul and Caruana, Rich},
journal={arXiv preprint:1909.09223},
year={2019}
}
</details>