Awesome

Awesome Machine Learning Interpretability

A maintained and curated list of practical and awesome responsible machine learning resources.

If you want to contribute to this list (and please do!), read over the contribution guidelines, send a pull request, or file an issue.

If something you contributed or found here is missing after our September 2023 reboot, please check the archive.

Community and Official Guidance Resources
Education Resources
AI Incidents, Critiques, and Research Resources
Technical Resources
Citing Awesome Machine Learning Interpretability
- Citation

Community and Official Guidance Resources

Community Frameworks and Guidance

This section is for responsible ML guidance put forward by organizations or individuals, not for official government guidance.

Infographics and Cheat Sheets

AI Red-Teaming Resources

Papers

Tools and Guidance

Generative AI Explainability

University Policies and Guidance

Conferences and Workshops

This section is for conferences, workshops and other major events related to responsible ML.

Official Policy, Frameworks, and Guidance

This section serves as a repository for policy documents, regulations, guidelines, and recommendations that govern the ethical and responsible use of artificial intelligence and machine learning technologies. From international legal frameworks to specific national laws, the resources cover a broad spectrum of topics such as fairness, privacy, ethics, and governance.

Australia

Canada

Finland

Ministry of Economic Affairs and Employment, Finland's Age of Artificial Intelligence: Turning Finland into a leading country in the application of artificial intelligence. Objective and recommendations for measures

France

Gouvernance des algorithmes d’intelligence artificielle dans le secteur financier (France)

Germany

Japan

Malaysia

The National Guidelines on AI Governance & Ethics

Netherlands

New Zealand

Singapore

Switzerland

Digital Switzerland Strategy 2025

United Kingdom

United States (Federal Government)

Consumer Financial Protection Bureau (CFPB)

Commodity Futures Trading Commission (CFTC)

Congressional Budget Office

H.R. 9720, AI Incident Reporting and Security Enhancement Act

Congressional Research Service

Copyright Office

Data.gov

Defense Advanced Research Projects Agency (DARPA)

Explainable Artificial Intelligence (XAI) (Archived)

Defense Technical Information Center

Computer Security Technology Planning Study, October 1, 1972

Department of Commerce

Department of Defense

Department of Education

Office of Educational Technology
- Designing for Education with Artificial Intelligence: An Essential Guide for Developers
- Empowering Education Leaders: A Toolkit for Safe, Ethical, and Equitable AI Integration, October 2024

Department of Energy

Artificial Intelligence and Technology Office

Department of Homeland Security

Department of Justice

Department of the Treasury

Managing Artificial Intelligence-Specific Cybersecurity Risks in the Financial Services Sector, March 2024

Equal Employment Opportunity Commission (EEOC)

Executive Office of the President of the United States

Federal Deposit Insurance Corporation (FDIC)

Supervisory Guidance on Model Risk Management

Federal Housing Finance Agency (FHFA)

Advisory Bulletin AB 2013-07 Model Risk Management Guidance

Federal Reserve

Supervisory Guidance on Model Risk Management

Federal Trade Commission (FTC)

Government Accountability Office (GAO)

National Security Agency (NSA)

Central Security Service, Artificial Intelligence Security Center

National Security Commission on Artificial Intelligence

Final Report

Office of the Comptroller of the Currency (OCC)

2021 Model Risk Management Handbook

Office of the Director of National Intelligence (ODNI)

Securities and Exchange Commission (SEC)

SEC Charges Two Investment Advisers with Making False and Misleading Statements About Their Use of Artificial Intelligence

United States Patent and Trademark Office (USPTO)

Public Views on Artificial Intelligence and Intellectual Property Policy

United States Senate

Committee on Commerce, Science, and Transportation, 2024.11.21 Letter to DOJ Re FARA AI Violation (Senator Ted Cruz to Attorney General Merrick Garland)

United States Web Design System (USWDS)

Design principles

United States (State Governments)

California

Kentucky

Legislative Research Commission, Research Report No. 491, Executive Branch Use of Artificial Intelligence Technology

Mississippi

Mississippi Department of Education, Artificial Intelligence Guidance for K-12 Classrooms

New York

North Carolina

North Carolina State Government Responsible Use of Artificial Intelligence Framework, August 2024

Texas

Federal Reserve Bank of Dallas, Regulation B, Equal Credit Opportunity, Credit Scoring Interpretations: Withdrawl of Proposed Business Credit Amendments, June 3, 1982

Utah

Questions from the Commission on Protecting Privacy and Preventing Discrimination

International and Multilateral Frameworks

European Union Policies and Regulations

Council of Europe

European Commission and Parliament

European Council

Artificial intelligence act: Council and Parliament strike a deal on the first rules for AI in the world

European Data Protection Authorities

OECD

OSCE

NATO

United Nations

Law Texts and Drafts

This section is a collection of law texts and drafts pertaining to responsible AI.

Education Resources

Comprehensive Software Examples and Tutorials

This section is a curated collection of guides and tutorials that simplify responsible ML implementation. It spans from basic model interpretability to advanced fairness techniques. Suitable for both novices and experts, the resources cover topics like COMPAS fairness analyses and explainable machine learning via counterfactuals.

Free-ish Books

This section contains books that can be reasonably described as free, including some "historical" books dealing broadly with ethical and responsible tech.

Glossaries and Dictionaries

This section features a collection of glossaries and dictionaries that are geared toward defining terms in ML, including some "historical" dictionaries.

Open-ish Classes

This section features a selection of educational courses focused on ethical considerations and best practices in ML. The classes range from introductory courses on data ethics to specialized training in fairness and trustworthy deep learning.

Podcasts and Channels

This section features podcasts and channels (such as on YouTube) that offer insightful commentary and explanations on responsible AI and machine learning interpretability.

AI Incidents, Critiques, and Research Resources

AI Incident Information Sharing Resources

This section houses initiatives, networks, repositories, and publications that facilitate collective and interdisciplinary efforts to enhance AI safety. It includes platforms where experts and practitioners come together to share insights, identify potential vulnerabilities, and collaborate on developing robust safeguards for AI systems, including AI incident trackers.

Bibliography of Papers on AI Incidents and Failures

AI Law, Policy, and Guidance Trackers

This section contains trackers, databases, and repositories of laws, policies, and guidance pertaining to AI.

Challenges and Competitions

This section contains challenges and competitions related to responsible ML.

Critiques of AI

This section contains an assortment of papers, articles, essays, and general resources that take critical stances toward generative AI.

Environmental Costs of AI

Groups and Organizations

Curated Bibliographies

We are seeking curated bibliographies related to responsible ML across various topics, see issue 115.

List of Lists

This section links to other lists of responsible ML or related resources.

Technical Resources

Benchmarks

This section contains benchmarks or datasets used for benchmarks for ML systems, particularly those related to responsible ML desiderata.

Resource	Description
benchm-ml	"A minimal benchmark for scalability, speed and accuracy of commonly used open source implementations (R packages, Python scikit-learn, H2O, xgboost, Spark MLlib etc.) of the top machine learning algorithms for binary classification (random forests, gradient boosted trees, deep neural networks etc.)."
Bias Benchmark for QA dataset (BBQ)	"Repository for the Bias Benchmark for QA dataset."
Cataloguing LLM Evaluations	"This repository stems from our paper, 'Cataloguing LLM Evaluations,' and serves as a living, collaborative catalogue of LLM evaluation frameworks, benchmarks and papers."
DecodingTrust	"A Comprehensive Assessment of Trustworthiness in GPT Models."
EleutherAI, Language Model Evaluation Harness	"A framework for few-shot evaluation of language models."
GEM	"GEM is a benchmark environment for Natural Language Generation with a focus on its Evaluation, both through human annotations and automated Metrics."
HELM	"A holistic framework for evaluating foundation models."
Hugging Face, evaluate	"Evaluate: A library for easily evaluating machine learning models and datasets."
i-gallegos, Fair-LLM-Benchmark	Benchmark from "Bias and Fairness in Large Language Models: A Survey"
MLCommons, MLCommons AI Safety v0.5 Proof of Concept	"The MLCommons AI Safety Benchmark aims to assess the safety of AI systems in order to guide development, inform purchasers and consumers, and support standards bodies and policymakers."
MLCommons, Introducing v0.5 of the AI Safety Benchmark from MLCommons	A paper about the MLCommons AI Safety Benchmark v0.5.
Nvidia MLPerf	"MLPerf™ benchmarks—developed by MLCommons, a consortium of AI leaders from academia, research labs, and industry—are designed to provide unbiased evaluations of training and inference performance for hardware, software, and services."
OpenML Benchmarking Suites	OpenML's collection of over two dozen benchmarking suites.
Real Toxicity Prompts (Allen Institute for AI)	"A dataset of 100k sentence snippets from the web for researchers to further address the risk of neural toxic degeneration in models."
SafetyPrompts.com	"A Living Catalogue of Open Datasets for LLM Safety."
Sociotechnical Safety Evaluation Repository	An extensive spreadsheet of sociotechnical safety evaluations in a spreadsheet.
TrustLLM-Benchmark	"A Comprehensive Study of Trustworthiness in Large Language Models."
Trust-LLM-Benchmark Leaderboard	A series of sortable leaderboards of LLMs based on different trustworthiness criteria.
TruthfulQA	"TruthfulQA: Measuring How Models Imitate Human Falsehoods."
WAVES: Benchmarking the Robustness of Image Watermarks	"This paper investigates the weaknesses of image watermarking techniques."
Wild-Time: A Benchmark of in-the-Wild Distribution Shifts over Time	"Benchmark for Natural Temporal Distribution Shift (NeurIPS 2022)."
Winogender Schemas	"Data for evaluating gender bias in coreference resolution systems."
yandex-research / tabred	"A Benchmark of Tabular Machine Learning in-the-Wild with real-world industry-grade tabular datasets."

Common or Useful Datasets

This section contains datasets that are commonly used in responsible ML evaulations or repositories of interesting/important data sources:

Domain-specific Software

This section curates specialized software tools aimed at responsible ML within specific domains, such as in healthcare, finance, or social sciences.

Machine Learning Environment Management Tools

This section contains open source or open access ML environment management software.

Resource	Description
dvc	"Manage and version images, audio, video, and text files in storage and organize your ML modeling process into a reproducible workflow."
gigantum	"Building a better way to create, collaborate, and share data-driven science."
mlflow	"An open source platform for the machine learning lifecycle."
mlmd	"For recording and retrieving metadata associated with ML developer and data scientist workflows."
modeldb	"Open Source ML Model Versioning, Metadata, and Experiment Management."
neptune	"A single place to manage all your model metadata."
Opik	"Evaluate, test, and ship LLM applications across your dev and production lifecycles."

Personal Data Protection Tools

This section contains tools for personal data protection.

Name	Description
LLM Dataset Inference: Did you train on my dataset?	"Official Repository for Dataset Inference for LLMs"

Open Source/Access Responsible AI Software Packages

This section contains open source or open access software used to implement responsible ML. As much as possible, descriptions are quoted verbatim from the respective repositories themselves. In rare instances, we provide our own descriptions (unmarked by quotes).

Browser

Name	Description
DiscriLens	"Discrimination in Machine Learning."
Hugging Face, BiasAware: Dataset Bias Detection	"BiasAware is a specialized tool for detecting and quantifying biases within datasets used for Natural Language Processing (NLP) tasks."
manifold	"A model-agnostic visual debugging tool for machine learning."
PAIR-code / datacardsplaybook	"The Data Cards Playbook helps dataset producers and publishers adopt a people-centered approach to transparency in dataset documentation."
PAIR-code / facets	"Visualizations for machine learning datasets."
PAIR-code / knowyourdata	"A tool to help researchers and product teams understand datasets with the goal of improving data quality, and mitigating fairness and bias issues."
TensorBoard Projector	"Using the TensorBoard Embedding Projector, you can graphically represent high dimensional embeddings. This can be helpful in visualizing, examining, and understanding your embedding layers."
What-if Tool	"Visually probe the behavior of trained machine learning models, with minimal coding."

C/C++

Name	Description
Born-again Tree Ensembles	"Born-Again Tree Ensembles: Transforms a random forest into a single, minimal-size, tree with exactly the same prediction function in the entire feature space (ICML 2020)."
Certifiably Optimal RulE ListS	"CORELS is a custom discrete optimization technique for building rule lists over a categorical feature space."
Secure-ML	"Secure Linear Regression in the Semi-Honest Two-Party Setting."

JavaScript

Name	Description
LDNOOBW	"List of Dirty, Naughty, Obscene, and Otherwise Bad Words"

Python

Name	Description
acd	"Produces hierarchical interpretations for a single prediction made by a pytorch neural network. Official code for Hierarchical interpretations for neural network predictions.”
aequitas	"Aequitas is an open-source bias audit toolkit for data scientists, machine learning researchers, and policymakers to audit machine learning models for discrimination and bias, and to make informed and equitable decisions around developing and deploying predictive tools.”
AI Fairness 360	"A comprehensive set of fairness metrics for datasets and machine learning models, explanations for these metrics, and algorithms to mitigate bias in datasets and models.”
AI Explainability 360	"Interpretability and explainability of data and machine learning models.”
ALEPython	"Python Accumulated Local Effects package.”
Aletheia	"A Python package for unwrapping ReLU DNNs.”
allennlp	"An open-source NLP research library, built on PyTorch.”
algofairness	See [Algorithmic Fairness][http://fairness.haverford.edu/).
Alibi	"Alibi is an open source Python library aimed at machine learning model inspection and interpretation. The focus of the library is to provide high-quality implementations of black-box, white-box, local and global explanation methods for classification and regression models.”
anchor	"Code for 'High-Precision Model-Agnostic Explanations' paper.”
Bayesian Case Model
Bayesian Ors-Of-Ands	"This code implements the Bayesian or-of-and algorithm as described in the BOA paper. We include the tictactoe dataset in the correct formatting to be used by this code.”
Bayesian Rule List (BRL)	Rudin group at Duke Bayesian case model implementation
BlackBoxAuditing	"Research code for auditing and exploring black box machine-learning models.”
CalculatedContent, WeightWatcher	"The WeightWatcher tool for predicting the accuracy of Deep Neural Networks."
casme	"contains the code originally forked from the ImageNet training in PyTorch that is modified to present the performance of classifier-agnostic saliency map extraction, a practical algorithm to train a classifier-agnostic saliency mapping by simultaneously training a classifier and a saliency mapping.”
Causal Discovery Toolbox	"Package for causal inference in graphs and in the pairwise settings. Tools for graph structure recovery and dependencies are included.”
captum	"Model interpretability and understanding for PyTorch.”
causalml	"Uplift modeling and causal inference with machine learning algorithms.”
cdt15, Causal Discovery Lab., Shiga University	"LiNGAM is a new method for estimating structural equation models or linear causal Bayesian networks. It is based on using the non-Gaussianity of the data."
checklist	"Beyond Accuracy: Behavioral Testing of NLP models with CheckList.”
cleverhans	"An adversarial example library for constructing attacks, building defenses, and benchmarking both.”
contextual-AI	"Contextual AI adds explainability to different stages of machine learning pipelines
ContrastiveExplanation (Foil Trees)	"provides an explanation for why an instance had the current outcome (fact) rather than a targeted outcome of interest (foil). These counterfactual explanations limit the explanation to the features relevant in distinguishing fact from foil, thereby disregarding irrelevant features.”
counterfit	"a CLI that provides a generic automation layer for assessing the security of ML models.”
dalex	"moDel Agnostic Language for Exploration and eXplanation.”
debiaswe	"Remove problematic gender bias from word embeddings.”
DeepExplain	"provides a unified framework for state-of-the-art gradient and perturbation-based attribution methods. It can be used by researchers and practitioners for better undertanding the recommended existing models, as well for benchmarking other attribution methods.”
DeepLIFT	"This repository implements the methods in 'Learning Important Features Through Propagating Activation Differences' by Shrikumar, Greenside & Kundaje, as well as other commonly-used methods such as gradients, gradient-times-input (equivalent to a version of Layerwise Relevance Propagation for ReLU networks), guided backprop and integrated gradients.”
deepvis	"the code required to run the Deep Visualization Toolbox, as well as to generate the neuron-by-neuron visualizations using regularized optimization.”
DIANNA	"DIANNA is a Python package that brings explainable AI (XAI) to your research project. It wraps carefully selected XAI methods in a simple, uniform interface. It's built by, with and for (academic) researchers and research software engineers working on machine learning projects.”
DiCE	"Generate Diverse Counterfactual Explanations for any machine learning model.”
DoWhy	"DoWhy is a Python library for causal inference that supports explicit modeling and testing of causal assumptions. DoWhy is based on a unified language for causal inference, combining causal graphical models and potential outcomes frameworks.”
dtreeviz	"A python library for decision tree visualization and model interpretation.”
ecco	"Explain, analyze, and visualize NLP language models. Ecco creates interactive visualizations directly in Jupyter notebooks explaining the behavior of Transformer-based language models (like GPT2, BERT, RoBERTA, T5, and T0).”
eli5	"A library for debugging/inspecting machine learning classifiers and explaining their predictions.”
explabox	"aims to support data scientists and machine learning (ML) engineers in explaining, testing and documenting AI/ML models, developed in-house or acquired externally. The explabox turns your ingestibles (AI/ML model and/or dataset) into digestibles (statistics, explanations or sensitivity insights).”
Explainable Boosting Machine (EBM)/GA2M	"an open-source package that incorporates state-of-the-art machine learning interpretability techniques under one roof. With this package, you can train interpretable glassbox models and explain blackbox systems. InterpretML helps you understand your model's global behavior, or understand the reasons behind individual predictions.”
ExplainaBoard	"a tool that inspects your system outputs, identifies what is working and what is not working, and helps inspire you with ideas of where to go next.”
explainerdashboard	"Quickly build Explainable AI dashboards that show the inner workings of so-called "blackbox" machine learning models.”
explainX	"Explainable AI framework for data scientists. Explain & debug any blackbox machine learning model with a single line of code.”
fair-classification	"Python code for training fair logistic regression classifiers.”
fairml	"a python toolbox auditing the machine learning models for bias.”
fairlearn	"a Python package that empowers developers of artificial intelligence (AI) systems to assess their system's fairness and mitigate any observed unfairness issues. Fairlearn contains mitigation algorithms as well as metrics for model assessment. Besides the source code, this repository also contains Jupyter notebooks with examples of Fairlearn usage.”
fairness-comparison	"meant to facilitate the benchmarking of fairness aware machine learning algorithms.”
fairness_measures_code	"contains implementations of measures used to quantify discrimination.”
Falling Rule List (FRL)	Rudin group at Duke falling rule list implementation
foolbox	"A Python toolbox to create adversarial examples that fool neural networks in PyTorch, TensorFlow, and JAX.”
Giskard	"The testing framework dedicated to ML models, from tabular to LLMs. Scan AI models to detect risks of biases, performance issues and errors. In 4 lines of code.”
Grad-CAM (GitHub topic)	Grad-CAM is a technique for making convolutional neural networks more transparent by visualizing the regions of input that are important for predictions in computer vision models.
gplearn	"implements Genetic Programming in Python, with a scikit-learn inspired and compatible API.”
H2O-3 Penalized Generalized Linear Models	"Fits a generalized linear model, specified by a response variable, a set of predictors, and a description of the error distribution."
H2O-3 Monotonic GBM	"Builds gradient boosted classification trees and gradient boosted regression trees on a parsed data set."
H2O-3 Sparse Principal Components (GLRM)	"Builds a generalized low rank decomposition of an H2O data frame."
h2o-LLM-eval	"Large-language Model Evaluation framework with Elo Leaderboard and A-B testing."
hate-functional-tests	HateCheck: A dataset and test suite from an ACL 2021 paper, offering functional tests for hate speech detection models, including extensive case annotations and testing functionalities.
imodels	"Python package for concise, transparent, and accurate predictive modeling. All sklearn-compatible and easy to use.”
iNNvestigate neural nets	A comprehensive Python library to analyze and interpret neural network behaviors in Keras, featuring a variety of methods like Gradient, LRP, and Deep Taylor.
Integrated-Gradients	"a variation on computing the gradient of the prediction output w.r.t. features of the input. It requires no modification to the original network, is simple to implement, and is applicable to a variety of deep models (sparse and dense, text and vision).”
interpret	"an open-source package that incorporates state-of-the-art machine learning interpretability techniques under one roof.”
interpret_with_rules	"induces rules to explain the predictions of a trained neural network, and optionally also to explain the patterns that the model captures from the training data, and the patterns that are present in the original dataset.”
InterpretME	"integrates knowledge graphs (KG) with machine learning methods to generate interesting meaningful insights. It helps to generate human- and machine-readable decisions to provide assistance to users and enhance efficiency.”
Keras-vis	"a high-level toolkit for visualizing and debugging your trained keras neural net models.”
keract	Keract is a tool for visualizing activations and gradients in Keras models; it's meant to support a wide range of Tensorflow versions and to offer an intuitive API with Python examples.
L2X	"Code for replicating the experiments in the paper Learning to Explain: An Information-Theoretic Perspective on Model Interpretation at ICML 2018, by Jianbo Chen, Mitchell Stern, Martin J. Wainwright, Michael I. Jordan.”
LangFair	"LangFair is a Python library for conducting use-case level LLM bias and fairness assessments"
langtest	"LangTest: Deliver Safe & Effective Language Models"
learning-fair-representations	"Python numba implementation of Zemel et al. 2013 http://www.cs.toronto.edu/~toni/Papers/icml-final.pdf"
leeky: Leakage/contamination testing for black box language models	"leeky - training data contamination techniques for blackbox models"
leondz / garak, LLM vulnerability scanner	"LLM vulnerability scanner"
lilac	"Curate better data for LLMs."
lime	"explaining what machine learning classifiers (or models) are doing. At the moment, we support explaining individual predictions for text classifiers or classifiers that act on tables (numpy arrays of numerical or categorical data) or images, with a package called lime (short for local interpretable model-agnostic explanations).”
LiFT	"The LinkedIn Fairness Toolkit (LiFT) is a Scala/Spark library that enables the measurement of fairness and the mitigation of bias in large-scale machine learning workflows. The measurement module includes measuring biases in training data, evaluating fairness metrics for ML models, and detecting statistically significant differences in their performance across different subgroups.”
lit	"The Learning Interpretability Tool (LIT, formerly known as the Language Interpretability Tool) is a visual, interactive ML model-understanding tool that supports text, image, and tabular data. It can be run as a standalone server, or inside of notebook environments such as Colab, Jupyter, and Google Cloud Vertex AI notebooks.”
LLM Dataset Inference: Did you train on my dataset?	"Official Repository for Dataset Inference for LLMs"
lofo-importance	"LOFO (Leave One Feature Out) Importance calculates the importances of a set of features based on a metric of choice, for a model of choice, by iteratively removing each feature from the set, and evaluating the performance of the model, with a validation scheme of choice, based on the chosen metric.”
lrp_toolbox	"The Layer-wise Relevance Propagation (LRP) algorithm explains a classifer's prediction specific to a given data point by attributing relevance scores to important components of the input by using the topology of the learned model itself.”
MindsDB	"enables developers to build AI tools that need access to real-time data to perform their tasks.”
MLextend	"Mlxtend (machine learning extensions) is a Python library of useful tools for the day-to-day data science tasks.”
ml-fairness-gym	"a set of components for building simple simulations that explore the potential long-run impacts of deploying machine learning-based decision systems in social environments.”
ml_privacy_meter	"an open-source library to audit data privacy in statistical and machine learning algorithms. The tool can help in the data protection impact assessment process by providing a quantitative analysis of the fundamental privacy risks of a (machine learning) model.”
mllp	"This is a PyTorch implementation of Multilayer Logical Perceptrons (MLLP) and Random Binarization (RB) method to learn Concept Rule Sets (CRS) for transparent classification tasks, as described in our paper: Transparent Classification with Multilayer Logical Perceptrons and Random Binarization.”
Monotonic Constraints	Guide on implementing and understanding monotonic constraints in XGBoost models to enhance predictive performance with practical Python examples.
XGBoost	"an optimized distributed gradient boosting library designed to be highly efficient, flexible and portable.”
Multilayer Logical Perceptron (MLLP)	"This is a PyTorch implementation of Multilayer Logical Perceptrons (MLLP) and Random Binarization (RB) method to learn Concept Rule Sets (CRS) for transparent classification tasks, as described in our paper: Transparent Classification with Multilayer Logical Perceptrons and Random Binarization.”
OptBinning	"a library written in Python implementing a rigorous and flexible mathematical programming formulation to solve the optimal binning problem for a binary, continuous and multiclass target type, incorporating constraints not previously addressed.”
Optimal Sparse Decision Trees	"This accompanies the paper, "Optimal Sparse Decision Trees" by Xiyang Hu, Cynthia Rudin, and Margo Seltzer.”
parity-fairness	"This repository contains codes that demonstrate the use of fairness metrics, bias mitigations and explainability tool.”
PDPbox	"Python Partial Dependence Plot toolbox. Visualize the influence of certain features on model predictions for supervised machine learning algorithms, utilizing partial dependence plots.”
PiML-Toolbox	"a new Python toolbox for interpretable machine learning model development and validation. Through low-code interface and high-code APIs, PiML supports a growing list of inherently interpretable ML models.”
pjsaelin / Cubist	"A Python package for fitting Quinlan's Cubist regression model"
Privacy-Preserving-ML	"Implementation of privacy-preserving SVM assuming public model private data scenario (data in encrypted but model parameters are unencrypted) using adequate partial homomorphic encryption.”
ProtoPNet	"This code package implements the prototypical part network (ProtoPNet) from the paper "This Looks Like That: Deep Learning for Interpretable Image Recognition" (to appear at NeurIPS 2019), by Chaofan Chen (Duke University), Oscar Li
pyBreakDown	See dalex.
PyCEbox	"Python Individual Conditional Expectation Plot Toolbox.”
pyGAM	"Generalized Additive Models in Python.”
pymc3	"PyMC (formerly PyMC3) is a Python package for Bayesian statistical modeling focusing on advanced Markov chain Monte Carlo (MCMC) and variational inference (VI) algorithms. Its flexibility and extensibility make it applicable to a large suite of problems.”
pySS3	"The SS3 text classifier is a novel and simple supervised machine learning model for text classification which is interpretable, that is, it has the ability to naturally (self)explain its rationale.”
pytorch-grad-cam	"a package with state of the art methods for Explainable AI for computer vision. This can be used for diagnosing model predictions, either in production or while developing models. The aim is also to serve as a benchmark of algorithms and metrics for research of new explainability methods.”
pytorch-innvestigate	"PyTorch implementation of Keras already existing project: https://github.com/albermax/innvestigate/.”
Quantus	"Quantus is an eXplainable AI toolkit for responsible evaluation of neural network explanations."
rationale	"This directory contains the code and resources of the following paper: "Rationalizing Neural Predictions". Tao Lei, Regina Barzilay and Tommi Jaakkola. EMNLP 2016. [PDF] [Slides]. The method learns to provide justifications, i.e. rationales, as supporting evidence of neural networks' prediction.”
responsibly	"Toolkit for Auditing and Mitigating Bias and Fairness of Machine Learning Systems.”
REVISE: REvealing VIsual biaSEs	"A tool that automatically detects possible forms of bias in a visual dataset along the axes of object-based, attribute-based, and geography-based patterns, and from which next steps for mitigation are suggested.”
robustness	"a package we (students in the MadryLab) created to make training, evaluating, and exploring neural networks flexible and easy.”
RISE	"contains source code necessary to reproduce some of the main results in the paper: Vitali Petsiuk, Abir Das, Kate Saenko (BMVC, 2018) [and] RISE: Randomized Input Sampling for Explanation of Black-box Models.”
Risk-SLIM	"a machine learning method to fit simple customized risk scores in python.”
SAGE	"SAGE (Shapley Additive Global importancE) is a game-theoretic approach for understanding black-box machine learning models. It quantifies each feature's importance based on how much predictive power it contributes, and it accounts for complex feature interactions using the Shapley value.”
SALib	"Python implementations of commonly used sensitivity analysis methods. Useful in systems modeling to calculate the effects of model inputs or exogenous factors on outputs of interest.”
Scikit-Explain	"User-friendly Python module for machine learning explainability," featuring PD and ALE plots, LIME, SHAP, permutation importance and Friedman's H, among other methods.
Scikit-learn Decision Trees	"a non-parametric supervised learning method used for classification and regression.”
Scikit-learn Generalized Linear Models	"a set of methods intended for regression in which the target value is expected to be a linear combination of the features.”
Scikit-learn Sparse Principal Components	"a variant of [principal component analysis, PCA], with the goal of extracting the set of sparse components that best reconstruct the data.”
scikit-fairness	Historical link. Merged with fairlearn.
scikit-multiflow	"a machine learning package for streaming data in Python.”
shap	"a game theoretic approach to explain the output of any machine learning model. It connects optimal credit allocation with local explanations using the classic Shapley values from game theory and their related extensions"
shapley	"a Python library for evaluating binary classifiers in a machine learning ensemble.”
sklearn-expertsys	"a scikit-learn compatible wrapper for the Bayesian Rule List classifier developed by Letham et al., 2015, extended by a minimum description length-based discretizer (Fayyad & Irani, 1993) for continuous data, and by an approach to subsample large datasets for better performance.”
skope-rules	"a Python machine learning module built on top of scikit-learn and distributed under the 3-Clause BSD license.”
solas-ai-disparity	"a collection of tools that allows modelers, compliance, and business stakeholders to test outcomes for bias or discrimination using widely accepted fairness metrics.”
Super-sparse Linear Integer models (SLIMs)	"a package to learn customized scoring systems for decision-making problems.”
tensorflow/lattice	"a library that implements constrained and interpretable lattice based models. It is an implementation of Monotonic Calibrated Interpolated Look-Up Tables in TensorFlow.”
tensorflow/lucid	"a collection of infrastructure and tools for research in neural network interpretability.”
tensorflow/fairness-indicators	"designed to support teams in evaluating, improving, and comparing models for fairness concerns in partnership with the broader Tensorflow toolkit.”
tensorflow/model-analysis	"a library for evaluating TensorFlow models. It allows users to evaluate their models on large amounts of data in a distributed manner, using the same metrics defined in their trainer. These metrics can be computed over different slices of data and visualized in Jupyter notebooks.”
tensorflow/model-card-toolkit	"streamlines and automates generation of Model Cards, machine learning documents that provide context and transparency into a model's development and performance. Integrating the MCT into your ML pipeline enables you to share model metadata and metrics with researchers, developers, reporters, and more.”
tensorflow/model-remediation	"a library that provides solutions for machine learning practitioners working to create and train models in a way that reduces or eliminates user harm resulting from underlying performance biases.”
tensorflow/privacy	"the source code for TensorFlow Privacy, a Python library that includes implementations of TensorFlow optimizers for training machine learning models with differential privacy. The library comes with tutorials and analysis tools for computing the privacy guarantees provided.”
tensorflow/tcav	"Testing with Concept Activation Vectors (TCAV) is a new interpretability method to understand what signals your neural networks models uses for prediction.”
tensorfuzz	"a library for performing coverage guided fuzzing of neural networks.”
TensorWatch	"a debugging and visualization tool designed for data science, deep learning and reinforcement learning from Microsoft Research. It works in Jupyter Notebook to show real-time visualizations of your machine learning training and perform several other key analysis tasks for your models and data.”
TextFooler	"A Model for Natural Language Attack on Text Classification and Inference"
text_explainability	"text_explainability provides a generic architecture from which well-known state-of-the-art explainability approaches for text can be composed.”
text_sensitivity	"Uses the generic architecture of text_explainability to also include tests of safety (how safe it the model in production, i.e. types of inputs it can handle), robustness (how generalizable the model is in production, e.g. stability when adding typos, or the effect of adding random unrelated data) and fairness (if equal individuals are treated equally by the model, e.g. subgroup fairness on sex and nationality).”
tf-explain	"Implements interpretability methods as Tensorflow 2.x callbacks to ease neural network's understanding.”
Themis	"A testing-based approach for measuring discrimination in a software system.”
themis-ml	"A Python library built on top of pandas and sklearnthat implements fairness-aware machine learning algorithms.”
TorchUncertainty	"A package designed to help you leverage uncertainty quantification techniques and make your deep neural networks more reliable.”
treeinterpreter	"Package for interpreting scikit-learn's decision tree and random forest predictions.”
TRIAGE	"This repository contains the implementation of TRIAGE, a "Data-Centric AI" framework for data characterization tailored for regression.”
woe	"Tools for WoE Transformation mostly used in ScoreCard Model for credit rating.”
xai	"A Machine Learning library that is designed with AI explainability in its core.”
xdeep	"An open source Python library for Interpretable Machine Learning.”
xplique	"A Python toolkit dedicated to explainability. The goal of this library is to gather the state of the art of Explainable AI to help you understand your complex neural network models.”
ydata-profiling	"Provide[s] a one-line Exploratory Data Analysis (EDA) experience in a consistent and fast solution.”
yellowbrick	"A suite of visual diagnostic tools called "Visualizers" that extend the scikit-learn API to allow human steering of the model selection process.”

R

Name	Description
ALEPlot	"Visualizes the main effects of individual predictor variables and their second-order interaction effects in black-box supervised learning models."
arules	"Provides the infrastructure for representing, manipulating and analyzing transaction data and patterns (frequent itemsets and association rules). Also provides C implementations of the association mining algorithms Apriori and Eclat. Hahsler, Gruen and Hornik (2005)."
Causal SVM	"We present a new machine learning approach to estimate whether a treatment has an effect on an individual, in the setting of the classical potential outcomes framework with binary outcomes."
DALEX	"moDel Agnostic Language for Exploration and eXplanation."
DALEXtra: Extension for 'DALEX' Package	"Provides wrapper of various machine learning models."
DrWhyAI	"DrWhy is [a] collection of tools for eXplainable AI (XAI). It's based on shared principles and simple grammar for exploration, explanation and visualisation of predictive models."
elasticnet	"Provides functions for fitting the entire solution path of the Elastic-Net and also provides functions for doing sparse PCA."
ExplainPrediction	"Generates explanations for classification and regression models and visualizes them."
Explainable Boosting Machine (EBM)/GA2M	"Package for training interpretable machine learning models."
fairmodels	"Flexible tool for bias detection, visualization, and mitigation. Use models explained with DALEX and calculate fairness classification metrics based on confusion matrices using fairness_check() or try newly developed module for regression models using fairness_check_regression()."
fairness	"Offers calculation, visualization and comparison of algorithmic fairness metrics."
fastshap	"The goal of fastshap is to provide an efficient and speedy approach (at least relative to other implementations) for computing approximate Shapley values, which help explain the predictions from any machine learning model."
featureImportance	"An extension for the mlr package and allows to compute the permutation feature importance in a model-agnostic manner."
flashlight	"The goal of this package is [to] shed light on black box machine learning models."
forestmodel	"Produces forest plots using 'ggplot2' from models produced by functions such as stats::lm(), stats::glm() and survival::coxph()."
fscaret	"Automated feature selection using variety of models provided by 'caret' package."
gam	"Functions for fitting and working with generalized additive models, as described in chapter 7 of "Statistical Models in S" (Chambers and Hastie (eds), 1991), and "Generalized Additive Models" (Hastie and Tibshirani, 1990)."
glm2	"Fits generalized linear models using the same model specification as glm in the stats package, but with a modified default fitting method that provides greater stability for models that may fail to converge using glm."
glmnet	"Extremely efficient procedures for fitting the entire lasso or elastic-net regularization path for linear regression, logistic and multinomial regression models, Poisson regression, Cox model, multiple-response Gaussian, and the grouped multinomial regression."
H2O-3 Penalized Generalized Linear Models	"Fits a generalized linear model, specified by a response variable, a set of predictors, and a description of the error distribution."
H2O-3 Monotonic GBM	"Builds gradient boosted classification trees and gradient boosted regression trees on a parsed data set."
H2O-3 Sparse Principal Components (GLRM)	"Builds a generalized low rank decomposition of an H2O data frame."
iBreakDown	"A model agnostic tool for explanation of predictions from black boxes ML models."
ICEbox: Individual Conditional Expectation Plot Toolbox	"Implements Individual Conditional Expectation (ICE) plots, a tool for visualizing the model estimated by any supervised learning algorithm."
iml	"An R package that interprets the behavior and explains predictions of machine learning models."
ingredients	"A collection of tools for assessment of feature importance and feature effects."
interpret: Fit Interpretable Machine Learning Models	"Package for training interpretable machine learning models."
lightgbmExplainer	"An R package that makes LightGBM models fully interpretable."
lime	"R port of the Python lime package."
live	"Helps to understand key factors that drive the decision made by complicated predictive model (black box model)."
mcr	"An R package for Model Reliance and Model Class Reliance."
modelDown	"Website generator with HTML summaries for predictive models."
modelOriented	GitHub repositories of Warsaw-based MI².AI.
modelStudio	"Automates the explanatory analysis of machine learning predictive models."
Monotonic XGBoost	Enforces consistent, directional relationships between features and predicted outcomes, enhancing model performance by aligning with prior data expectations.
quantreg	"Estimation and inference methods for models for conditional quantile functions."
rpart	"Recursive partitioning for classification, regression and survival trees."
RuleFit	"Implements the learning method and interpretational tools described in Predictive Learning via Rule Ensembles."
Scalable Bayesian Rule Lists (SBRL)	A more scalable implementation of Bayesian rule list from the Rudin group at Duke.
shapFlex	Computes stochastic Shapley values for machine learning models to interpret them and evaluate fairness, including causal constraints in the feature space.
shapleyR	"An R package that provides some functionality to use mlr tasks and models to generate shapley values."
shapper	"Provides SHAP explanations of machine learning models."
smbinning	"A set of functions to build a scoring model from beginning to end."
vip	"An R package for constructing variable importance plots (VIPs)."
xgboostExplainer	"An R package that makes xgboost models fully interpretable.

Citing Awesome Machine Learning Interpretability

Contributors with over 100 edits can be named coauthors in the citation of visible names. Otherwise, all contributors with fewer than 100 edits are included under "et al."

Bibtex

@misc{amli_repo,
  author={Patrick Hall and Daniel Atherton},
  title={Awesome Machine Learning Interpretability},
  year={2024},
  note={\url{https://github.com/jphall663/awesome-machine-learning-interpretability}}
}

ACM, APA, Chicago, and MLA

ACM (Association for Computing Machinery)

Hall, Patrick, Daniel Atherton, et al. 2024. Awesome Machine Learning Interpretability. GitHub. https://github.com/jphall663/awesome-machine-learning-interpretability.

APA (American Psychological Association) 7th Edition

Hall, Patrick, Daniel Atherton, et al. (2024). Awesome Machine Learning Interpretability [GitHub repository]. GitHub. https://github.com/jphall663/awesome-machine-learning-interpretability.

Chicago Manual of Style 17th Edition

Hall, Patrick, Daniel Atherton, et al. "Awesome Machine Learning Interpretability." GitHub. Last modified 2023. https://github.com/jphall663/awesome-machine-learning-interpretability.

MLA (Modern Language Association) 9th Edition

Hall, Patrick, Daniel Atherton, et al. "Awesome Machine Learning Interpretability." GitHub, 2024, https://github.com/jphall663/awesome-machine-learning-interpretability. Accessed 5 March 2024.