Awesome

XAI

Open source tools

Papers and code of Explainable AI esp. w.r.t. Image classificiation

2013 Conference Papers

Title	Paper Title	Source Link	Code	Tags
Visualization of CNN	Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps	CVPR2013	PyTorch	`Visualization gradient-based saliency maps`

2016 Conference Papers

Title	Paper Title	Source Link	Code	Tags
CAM	Learning Deep Features for Discriminative Localization	CVPR2016	PyTorch (Official)	`class activation mapping`
LIME	“Why Should I Trust You?”Explaining the Predictions of Any Classifier	KDD2016	PyTorch (Official)	`trust a prediction`

2017 Conference Papers

Title	Paper Title	Source Link	Code	Tags
Grad-CAM	Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization	ICCV2017, CVPR2016 (original)	PyTorch	`Visualization gradient-based saliency maps`
Network Dissection	Network Dissection: Quantifying Interpretability of Deep Visual Representations	CVPR2017	PyTorch (Official)	`Visualization`

2018 Conference Papers

Title	Paper Title	Source Link	Code	Tags
TCAV	Interpretability Beyond Feature Attribution:Quantitative Testing with Concept Activation Vectors (TCAV)	ICML 2018	Tensorflow 1.15.2	`interpretability method`
Interpretable CNN	Interpretable Convolutional Neural Networks	CVPR 2018	Tensorflow 1.x	`explainability by design`
Anchors	Anchors: High-Precision Model-Agnostic Explanations	AAAI 2018	sklearn (Official)	`model-agnostic`
Sanity Checks	Sanity checks for saliency maps	NeurIPS 2018	PyTorch	`saliency methods vs edge detector`
Grad Cam++	Grad Cam++:Improved Visual Explanations forDeep Convolutional Networks	WACV 2018	PyTorch	`saliency maps`
Interpretable Basis	Interpretable Basis Decomposition for Visual Explanation	ECCV 2018	PyTorch	`ibd`

2019 Conference Papers

Title	Paper Title	Source Link	Code	Tags
Full-grad	Full-Gradient Representation for Neural Network Visualization	NeurIPS2019	PyTorch (Official) Tensorflow	`saliency map representation`
This looks like that	This Looks Like That: Deep Learning for Interpretable Image Recognition	NeurIPS2019	PyTorch (Official)	`object`
Counterfactual visual explanations	Counterfactual visual explanations	ICML2019		`interpretability`
concept with contribution interpretable cnn	Explaining Neural Networks Semantically and Quantitatively	ICCV 2019
SIS	What made you do this? Understanding black-box decisions with sufficient input subsets	AISTATS 2019 - Supplementary Material	Tensorflow 1.x
Filter as concept detector	Filters in Convolutional Neural Networks as Independent Detectors of Visual Concepts	ACM

2020 Papers

Title	Paper Title	Source Link	Code	Tags
INN	Making Sense of CNNs: Interpreting Deep Representations & Their Invariances with INNs	ECCV 2020	PyTorch	`explainability by design`
X-Grad CAM	Axiom-based Grad-CAM: Towards Accurate Visualization and Explanation of CNNs		PyTorch
Revisiting BP saliency	There and Back Again: Revisiting Backpropagation Saliency Methods	CVPR 2020	PyTorch	`grad cam failure noted`
Interacting with explanation	Making deep neural networks right for the right scientific reasons by interacting with their explanations	Nature Machine Intelligence	sklearn
Class specific Filters	Training Interpretable Convolutional Neural Networks by Differentiating Class-specific Filters	ECCV Supplementary Material	Code - not yet updated	ICLR rejected version with reviews
Interpretable Decoupling	Interpretable Neural Network Decoupling	ECCV 2020
iCaps	iCaps: An Interpretable Classifier via Disentangled Capsule Networks	ECCV Supplementary Material
VQA	Interpretable Visual Reasoning via Probabilistic Formulation under Natural Supervision	ECCV 2020	PyTorch
When explanations lie	When Explanations Lie: Why Many Modified BP Attributions Fail	ICML 2020	PyTorch
Similarity models	Towards Visually Explaining Similarity Models	Arxiv
Quantify trust	How Much Should I Trust You? Modeling Uncertainty of Black Box Explanations	NeurIPS 2020 submission		`hima_lakkaraju`,`sameer_singh`,`model-agnostic`
Concepts for segmentation task	ABSTRACTING DEEP NEURAL NETWORKS INTO CONCEPT GRAPHS FOR CONCEPT LEVEL INTERPRETABILITY	Arxiv	Tensorflow 1.14	`brain tumour segmentation`
Deep Lift based Network Pruning	Utilizing Explainable AI for Quantization and Pruning of Deep Neural Networks	Arxiv NeurIPS format		`nas`,`deep_lift`
Unifed Attribution Framework	A Unified Taylor Framework for Revisiting Attribution Methods	Arxiv updated		`taylor`,`attribution_framework`
Global Cocept Attribution	Towards Global Explanations of Convolutional Neural Networks with Concept Attribution	CVPR 2020
relevance estimation	Determining the Relevance of Features for Deep Neural Networks	ECCV 2020
localized concept maps	Explaining AI-based Decision Support Systems using Concept Localization Maps	Arxiv	Just repository created
quantify saliency	Quantifying Explainability of Saliency Methods in Deep Neural Networks	Arxiv	PyTorch
generalization of LIME - MeLIME	MeLIME: Meaningful Local Explanation for Machine Learning Models	Arxiv	Tensorflow 1.15
global counterfactual explanations	Interpretable and Interactive Summaries of Actionable Recourses	Arxiv
fine grained counterfactual heatmaps	SCOUTER: Slot Attention-based Classifier for Explainable Image Recognition	Arxiv	PyTorch	`scouter`
quantify trust	How Much Can We Really Trust You? Towards Simple, Interpretable Trust Quantification Metrics for Deep Neural Networks	Arxiv
Non-negative concept activation vectors	IMPROVING INTERPRETABILITY OF CNN MODELS USING NON-NEGATIVE CONCEPT ACTIVATION VECTORS	Arxiv
different layer activations	Explaining Neural Networks by Decoding Layer Activations	Arxiv
concept bottleneck networks	Concept Bottleneck Models	ICML 2020	PyTorch
attribution	Visualizing the Impact of Feature Attribution Baselines	Distill
CSI	Contextual Semantic Interpretability	Arxiv		`explainable_by_design`
Improve black box via explanation	Introspective Learning by Distilling Knowledge from Online Self-explanation	Arxiv		`kowledge_distillation`
Patch explanations	Information-Theoretic Visual Explanation for Black-Box Classifiers	Arxiv	Tensorflow 1.13.1	`patch_sampling`,`information_theory`
Causality	Long-Tailed Classification by Keeping the Good and Removing the Bad Momentum Causal Effect	NeurIPS 2020	PyTorch
Concept in Time series data	Conceptual Explanations of Neural Network Prediction for Time Series	IJCNN 2020		`time series`, see if useful someway
Explainable by Design	Trustworthy Convolutional Neural Networks:A Gradient Penalized-based Approach	Arxiv
Colorwise Saliency	Visualizing Color-wise Saliency of Black-Box Image Classification Models	Arxiv
concept based	Concept Discovery for The Interpretation of Landscape Scenicness	Downloadable File
Integrated Score CAM	IS-CAM: Integrated Score-CAM for axiomatic-based explanations	Arxiv
Grad LAM	Grad-LAM: Visualization of Deep Neural Networks for Unsupervised Learning	EURASIP 2020
Cites TCAV	Integrating Intrinsic and Extrinsic Explainability: The Relevance of Understanding Neural Networks for Human-Robot Interaction	AAAI 2020
Attribution	Learning Propagation Rules for Attribution Map Generation	Arxiv
Zoom CAM	Zoom-CAM: Generating Fine-grained Pixel Annotations from Image Labels	Arxiv		`must read before modularity proposal`
Masking based saliency maps investigation	INVESTIGATING AND SIMPLIFYING MASKING-BASED SALIENCY MAP METHODS FOR MODEL INTERPRETABILITY	Arxiv	PyTorch
Evaluation	Evaluating Attribution Methods using White-Box LSTMs	EMNLP Workshop	PyTorch	`cites TCAV`, `says all explanations fail their test`
Interpretable Bayesian Neural Networks	Incorporating Interpretable Output Constraints in Bayesian Neural Networks	NeurIPS 2020	PyTorch
Survey - Counterfactual explanations	Counterfactual Explanations for Machine Learning: A Review	Arxiv
Standardised Explainability	The Need for Standardised Explainability	ICML 2020 Workshop
CME	Now You See Me (CME): Concept-based Model Extraction	CIKM 2020 workshop	sklearn
Q FIT	Q-FIT: The Quantifiable Feature Importance Technique for Explainable Machine Learning	Arxiv
Outside black box	Learning outside the Black-Box: The pursuit of interpretable models	NeurIPS 2020	sklearn
Discrete Mask	Interpreting Image Classifiers by Generating Discrete Masks	IEEE - PAMI
Contrastive explanations	Learning Global Transparent Models Consistent with Local Contrastive Explanations	NeurIPS 2020
Empirical study of Ideal Explanations	How Can I Explain This to You? An Empirical Study of Deep Neural Network Explanation Methods	NeurIPS 2020	tensorflow 1.15	Example based matching library
This Looks Like That + Relevance	This Looks Like That, Because ... Explaining Prototypes for Interpretable Image Recognition	Arxiv	PyTorch	`must read before relevance`
Concept based posthoc	ProtoViewer: Visual Interpretation and Diagnostics of Deep Neural Networks with Factorized Prototypes	Paper		`refer human subject experiments`
Shapley Flow	Shapley Flow: A Graph-based Approach to Interpreting Model Predictions	Arxiv
Attention Vs Saliency and Beyond	The elephant in the interpretability room: Why use attention as explanation when we have saliency methods?	Arxiv
Unification of removal methods	Feature Removal Is A Unifying Principle For Model Explanation Methods	NeurIPS 2020 workshop	PyTorch	`from the authors of SHAP`Extended Arxiv version
Robust and Stable Black Box Explanations	Robust and Stable Black Box Explanations	ICML 2020		`hima lakkaraju`
Debugging test	Debugging Tests for Model Explanations	Arxiv
AISTATS 2020 submission	Ensuring Actionable Recourse via Adversarial Training	Arxiv		`hima lakkaraju`
Layer wise explanation	Investigating Learning in Deep Neural Networks using Layer-Wise Weight Change	ResearchGate
cites TCAV	Debiasing Convolutional Neural Networks via Meta Orthogonalization	Arxiv	Code page not found
Introducing concepts	SeXAI: Introducing Concepts into Black Boxes for Explainable Artificial Intelligence	Paper	Tensorflow 1.4
Additive explainers	Learning simplified functions to understand	Paper
BIN	Born Identity Network: Multi-way Counterfactual Map Generation to Explain a Classifier’s Decision	Arxiv	Tensorflow 2.2	`counterfactual explanations`
Explantion using Generative models	Explaining image classifiers by removing input features using generative models	ACCV 2020	Tensorflow 1.12 & Pytorch 1.1	Nguyen's paper
Action Recognition Explanation	Play Fair: Frame Attributions in Video Models	ACCV 2020	PyTorch
Concepts in VQA	Interpretable Visual Reasoning via Induced Symbolic Space	Arxiv	Code not yet updated, just repository created
Recourses	Beyond Individualized Recourse: Interpretable and Interactive Summaries of Actionable Recourses	NeurIPS 2020		`hima lakkaraju`
Feature Importance of CNN	Measuring Feature Importance of Convolutional Neural Networks	IEEE
Causal Inference	Causal inference using deep neural networks	Arxiv	Keras
Match up	Match Them Up: Visually Explainable Few-shot Image Classification	Arxiv	PyTorch
Right for the Right Concept	Right for the Right Concept: Revising Neuro-Symbolic Concepts by Interacting with their Explanations	Arxiv
MALC	Transparency Promotion with Model-Agnostic Linear Competitors	ICML 2020
Shapley Taylor Index	The Shapley Taylor Interaction Index	ICML 2020
Concept based explanation + user feedback	Teaching the Machine to Explain Itself using Domain Knowledge	Openreview
Counterfactual produces Adversarial	Semantics and explanation: why counterfactual explanations produce adversarial examples in deep neural networks	AIJ submission
MEME	MEME: Generating RNN Model Explanations via Model Extraction	OpenReview	Keras	RNN specific LIME, see if any improvisations for MACE comes from here
ProtoPShare	ProtoPShare: Prototype Sharing for Interpretable Image Classification and Similarity Discovery	Arxiv - Accepted at ACM SIGKDD 2021	PyTorch	Improved ProtoPNet (This looks like that)
RANCC	RANCC: Rationalizing Neural Networks via Concept Clustering	ACL	Tensorflow 1.x
EAN	Efficient Attention Network: Accelerate Attention by Searching Where to Plug	Arxiv	PyTorch
LIME Analysis	Why model why? Assessing the strengths and limitations of LIME	Arxiv	sklearn
Rethink positive aggregation	Rethinking Positive Aggregation and Propagation of Gradients in Gradient-based Saliency Methods	ICML 2020 workshop WHI
Pixel wise interpretation metric	A Metric to Compare Pixel-wise Interpretation Methods for Neural Networks	IEEE
Latent space debiasing	Fair Attribute Classification through Latent Space De-biasing	Arxiv	PyTorch
Explanation - Teacher Student	Evaluating Explanations: How much do explanations from the teacher aid students?	Arxiv
Neural Prototype Trees	Neural Prototype Trees for Interpretable Fine-grained Image Recognition	Arxiv	PyTorch	same group of This looks like that + relevance
FixOut	FixOut: an ensemble approach to fairer models	Paper
Concepts on Tabular data	Learning Interpretable Concept-Based Models with Human Feedback	Arxiv
BayLIME	BayLIME: Bayesian Local Interpretable Model-Agnostic Explanations	Arxiv	Keras
PPI	Proactive Pseudo-Intervention: Causally Informed Contrastive Learning For Interpretable Vision Models	Arxiv	Anonymous PyTorch code link given
Generalized distillation	Understanding Interpretability by generalized distillation in Supervised Classification	AAAI 2021 submission	Code will be public upon acceptance
RIG	A Singular Value Perspective on Model Robustness	Arxiv
Activation analysis	Explaining Predictions of Deep Neural Classifier via Activation Analysis	Arxiv
Evaluation metrics	Evaluating Explainable Methods for Predictive Process Analytics: A Functionally-Grounded Approach	Arxiv	sklearn
Explanations based on train set	Explainable Artificial Intelligence: How Subsets of the Training Data Affect a Prediction	Arxiv
DAX	DAX: Deep Argumentative eXplanation for Neural Networks	Arxiv
Debiased CAM	Debiased-CAM for bias-agnostic faithful visual explanations of deep convolutional networks	Arxiv	Tensorflow 2.1.0	lot of human subject experiments found
Bias via explanation	Investigating Bias in Image Classification using Model Explanations	ICML WHI 2020
Shapley Credit Allocation	On Shapley Credit Allocation for Interpretability	Arxiv
Dependency Decomposition	Dependency Decomposition and a Reject Option for Explainable Models	Arxiv
Interpretation Network	xRAI: Explainable Representations through AI	Arxiv
Explainable by Design	Evolutionary Generative Contribution Mappings	IEEE		`explainable by design`
Transformer Explanation	Transformer Interpretability Beyond Attention Visualization	Arxiv CVPR format	PyTorch
MANE	MANE: Model-Agnostic Non-linear Explanations for Deep Learning Model	IEEE		`see how similar to MAIRE`
Why and Why Not Explanations	On Relating ‘Why?’ and ‘Why Not?’ Explanations	Arxiv	sklearn	gives theoretical relationship between feature importance and counterfactual techniques
cites ACE	Analyzing Representations inside Convolutional Neural Networks	Arxiv	PyTorch
CEN	CEN: Concept Evolution Network for Image Classification Tasks	ACM RICAI 2020		`explainable by design`
Quantitative evaluation metrics	Quantitative Evaluations on Saliency Methods: An Experimental Study	Arxiv
Integrating black box and Interpretable model	IB-M: A Flexible Framework to Align an Interpretable Model and a Black-box Model	IEEE - BIBM 2020
X-GradCAM	Axiom-based Grad-CAM: Towards Accurate Visualization and Explanation of CNNs	BMVC 2020
RCAV	Robust Semantic Interpretability: Revisiting Concept Activation Vectors	ICML WHI 2020	PyTorch

2021 Papers

Title	Paper Title	Source Link	Code	Tags
Debiasing concepts	Debiasing Concept Bottleneck Models with Instrumental Variables	ICLR 2021 submissions page - Accepted as Poster		`causality`
Prototype Trajectory	Interpretable Sequence Classification Via Prototype Trajectory	ICLR 2021 submissions page		`this looks like that styled RNN`
Shapley dependence assumption	Shapley explainability on the data manifold	ICLR 2021 submissions page
High dimension Shapley	Human-interpretable model explainability on high-dimensional data	ICLR 2021 submissions page
L2x like paper	A Learning Theoretic Perspective on Local Explainability	ICLR 2021 submissions page
Evaluation	Evaluation of Similarity-based Explanations	ICLR 2021 submissions page		`like adebayo paper for this looks like that styled methods`
Model correction	Defuse: Debugging Classifiers Through Distilling Unrestricted Adversarial Examples	ICLR 2021 submissions page
Subspace explanation	Constraint-Driven Explanations of Black-Box ML Models	ICLR 2021 submissions page		`to see how close to MUSE by Hima Lakkaraju 2019`
Catastrophic forgetting	Remembering for the Right Reasons: Explanations Reduce Catastrophic Forgetting	ICLR 2021 submissions page	Code available in their Supplementary zip file
Non trivial counterfactual explanations	Beyond Trivial Counterfactual Generations with Diverse Valuable Explanations	ICLR 2021 submissions page
Explainable by Design	Interpretability Through Invertibility: A Deep Convolutional Network With Ideal Counterfactuals And Isosurfaces	ICLR 2021 submissions page
Gradient attribution	Rethinking the Role of Gradient-based Attribution Methods for Model Interpretability	ICLR 2021 submissions page		`looks like extension of Sixt et al paper`
Mask based Explainable by Design	Investigating and Simplifying Masking-based Saliency Methods for Model Interpretability	ICLR 2021 submissions page
NBDT - Explainable by Design	NBDT: Neural-Backed Decision Tree	ICLR 2021 submissions page
Variational Saliency Maps	Variational saliency maps for explaining model's behavior	ICLR 2021 submissions page
Network dissection with coherency or stability metric	Importance and Coherence: Methods for Evaluating Modularity in Neural Networks	ICLR 2021 submissions page
Modularity	Are Neural Nets Modular? Inspecting Functional Modularity Through Differentiable Weight Masks	ICLR 2021 submissions page	Code made anonymous for review, link given in paper
Explainable by design	A self-explanatory method for the black problem on discrimination part of CNN	ICLR 2021 submissions page		`seems concepts of game theory applied`
Attention not Explanation	Why is Attention Not So Interpretable?	ICLR 2021 submissions page
Ablation Saliency	Ablation Path Saliency	ICLR 2021 submissions page
Explainable Outlier Detection	Explainable Deep One-Class Classification	ICLR 2021 submissions page
XAI without approximation	Explainable AI Wthout Interpretable Model	Arxiv
Learning theoretic Local Interpretability	A LEARNING THEORETIC PERSPECTIVE ON LOCAL EXPLAINABILITY	Arxiv
GANMEX	GANMEX: ONE-VS-ONE ATTRIBUTIONS USING GAN-BASED MODEL EXPLAINABILITY	Arxiv
Evaluating Local Explanations	Evaluating local explanation methods on ground truth	Artificial Intelligence Journal Elsevier	sklearn
Structured Attention Graphs	Structured Attention Graphs for Understanding Deep Image Classifications	AAAI 2021	PyTorch	see how close to MACE
Ground truth explanations	Data Representing Ground-Truth Explanations to Evaluate XAI Methods	AAAI 2021	sklearn	trained models available in their github repository
AGF	Visualization of Supervised and Self-Supervised Neural Networks via Attribution Guided Factorization	AAAI 2021	PyTorch
RSP	Interpreting Deep Neural Networks with Relative Sectional Propagation by Analyzing Comparative Gradients and Hostile Activations	AAAI 2021
HyDRA	HYDRA: Hypergradient Data Relevance Analysis for Interpreting Deep Neural Networks	AAAI 2021	PyTorch
SWAG	SWAG: Superpixels Weighted by Average Gradients for Explanations of CNNs	WACV 2021
FastIF	FASTIF: Scalable Influence Functions for Efficient Model Interpretation and Debugging	Arxiv	PyTorch
EVET	EVET: Enhancing Visual Explanations of Deep Neural Networks Using Image Transformations	WACV 2021
Local Attribution Baselines	On Baselines for Local Feature Attributions	AAAI 2021	PyTorch
Differentiated Explanations	Differentiated Explanation of Deep Neural Networks with Skewed Distributions	IEEE - TPAMI journal	PyTorch
Human game based survey	Explainable AI and Adoption of Algorithmic Advisors: an Experimental Study	Arxiv
Explainable by design	Learning Semantically Meaningful Features for Interpretable Classifications	Arxiv
Expred	Explain and Predict, and then Predict again	ACM WSDM 2021	PyTorch
Progressive Interpretation	An Information-theoretic Progressive Framework for Interpretation	Arxiv	PyTorch
UCAM	Uncertainty Class Activation Map (U-CAM) using Gradient Certainty method	IEEE - TIP	Project Page	PyTorch
progressive GAN explainability- smiling dataset- ICLR 2020 group	Explaining the Black-box Smoothly - A Counterfactual Approach	Arxiv
Head pasted in another image - experimented	WHAT DO DEEP NETS LEARN? CLASS-WISE PATTERNS REVEALED IN THE INPUT SPACE	Arxiv
Model correction	ExplOrs Explanation Oracles and the architecture of explainability	Paper
Explanations - Knowledge Representation	A Basic Framework for Explanations in Argumentation	IEEE
Eigen CAM	Eigen-CAM: Visual Explanations for Deep Convolutional Neural Networks	Springer
Evaluation of Posthoc	How can I choose an explainer? An Application-grounded Evaluation of Post-hoc Explanations	ACM
GLocalX	GLocalX - From Local to Global Explanations of Black Box AI Models	Arxiv
Consistent Interpretations	Explainable Models with Consistent Interpretations	AAAI 2021
SIDU	Introducing and assessing the explainable AI (XAI) method: SIDU	Arxiv
cites This looks like that	Explaining black-box classifiers using post-hoc explanations-by-example: The effect of explanations and error-rates in XAI user studies	AIJ
i-Algebra	i-Algebra: Towards Interactive Interpretability of Deep Neural Networks	AAAI 2021
Shape texture bias	SHAPE OR TEXTURE: UNDERSTANDING DISCRIMINATIVE FEATURES IN CNNS	ICLR 2021
Class agnostic features	THE MIND’S EYE: VISUALIZING CLASS-AGNOSTIC FEATURES OF CNNS	Arxiv
IBEX	A Multi-layered Approach for Tailored Black-box Explanations	Paper	Code
Relevant explanations	Learning Relevant Explanations	Paper
Guided Zoom	Guided Zoom: Zooming into Network Evidence to Refine Fine-grained Model Decisions	IEEE
XAI survey	A Survey on Understanding, Visualizations, and Explanation of Deep Neural Networks	Arxiv
Pattern theory	Convolutional Neural Network Interpretability with General Pattern Theory	Arxiv	PyTorch
Gaussian Process based explanations	Bandits for Learning to Explain from Explanations	AAAI 2021	sklearn
LIFT CAM	LIFT-CAM: Towards Better Explanations for Class Activation Mapping	Arxiv
ObAIEx	Right for the Right Reasons: Making Image Classification Intuitively Explainable	Paper	tensorflow
VAE based explainer	Combining an Autoencoder and a Variational Autoencoder for Explaining the Machine Learning Model Predictions	IEEE
Segmentation based explanation	Deep Co-Attention Network for Multi-View Subspace Learning	Arxiv	PyTorch
Integrated CAM	INTEGRATED GRAD-CAM: SENSITIVITY-AWARE VISUAL EXPLANATION OF DEEP CONVOLUTIONAL NETWORKS VIA INTEGRATED GRADIENT-BASED SCORING	ICASSP 2021	PyTorch
Human study	VitrAI - Applying Explainable AI in the Real World	Arxiv
Attribution Mask	Attribution Mask: Filtering Out Irrelevant Features By Recursively Focusing Attention on Inputs of DNNs	Arxiv	PyTorch
LIME faithfulness	What does LIME really see in images?	Arxiv	Tensorflow 1.x
Assess model reliability	Intuitively Assessing ML Model Reliability through Example-Based Explanations and Editing Model Inputs	Arxiv
Perturbation + Gradient unification	Towards the Unification and Robustness of Perturbation and Gradient Based Explanations	Arxiv		`hima lakkaraju`
Gradients faithful?	Do Input Gradients Highlight Discriminative Features?	Arxiv	PyTorch
Untrustworthy predictions	Identifying Untrustworthy Predictions in Neural Networks by Geometric Gradient Analysis	Arxiv
Explaining misclassification	Explaining Inaccurate Predictions of Models through k-Nearest Neighbors	Paper		cites Oscar Li AAAI 2018 prototypes paper
Explanations inside predictions	Have We Learned to Explain?: How Interpretability Methods Can Learn to Encode Predictions in their Interpretations	AISTATS 2021
Layerwise interpretation	LAYER-WISE INTERPRETATION OF DEEP NEURAL NETWORKS USING IDENTITY INITIALIZATION	Arxiv
Visualizing Rule Sets	Visualizing Rule Sets: Exploration and Validation of a Design Space	Arxiv	PyTorch
Human experiments	Are Explanations Helpful? A Comparative Study of the Effects of Explanations in AI-Assisted Decision-Making	IUI 2021
Attention fine-grained classification	Interpretable Attention Guided Network for Fine-grained Visual Classification	Arxiv
Concept construction	Explaining Classifiers by Constructing Familiar Concepts	Paper	PyTorch
EbD	Human-Understandable Decision Making for Visual Recognition	Arxiv
Bridging XAI algorithm , Human needs	Towards Connecting Use Cases and Methods in Interpretable Machine Learning	Arxiv
Generative trustworthy classifiers	Generative Classifiers as a Basis for Trustworthy Image Classification	Paper	Github
Counterfactual explanations	Generating Interpretable Counterfactual Explanations By Implicit Minimisation of Epistemic and Aleatoric Uncertainties	AISTATS 2021	PyTorch
Role categorization of CNN units	Quantitative Effectiveness Assessment and Role Categorization of Individual Units in Convolutional Neural Networks	ICML 2021
Non-trivial counterfactual explanations	Beyond Trivial Counterfactual Explanations with Diverse Valuable Explanations	Arxiv
NP-ProtoPNet	These do not Look Like Those: An Interpretable Deep Learning Model for Image Recognition	IEEE
Correcting neural networks based on explanations	Refining Neural Networks with Compositional Explanations	Arxiv	Code link given in paper, but page not found
Contrastive reasoning	Contrastive Reasoning in Neural Networks	Arxiv
Concept based	Intersection Regularization for Extracting Semantic Attributes	Arxiv
Boundary explanations	Boundary Attributions Provide Normal (Vector) Explanations	Arxiv	PyTorch
Generative Counterfactuals	ECINN: Efficient Counterfactuals from Invertible Neural Networks	Arxiv
ICE	Invertible Concept-based Explanations for CNN Models with Non-negative Concept Activation Vectors	AAAI 2021
Group CAM	Group-CAM: Group Score-Weighted Visual Explanations for Deep Convolutional Networks	Arxiv	PyTorch
HMM interpretability	Towards interpretability of Mixtures of Hidden Markov Models	AAAI 2021	sklearn
Empirical Explainers	Efficient Explanations from Empirical Explainers	Arxiv	PyTorch
FixNorm	FIXNORM: DISSECTING WEIGHT DECAY FOR TRAINING DEEP NEURAL NETWORKS	Arxiv
CoDA-Net	Convolutional Dynamic Alignment Networks for Interpretable Classifications	CVPR 2021	Code link given in paper. Repository not yet created
Like Dr. Chandru sir's (IITPKD) XAI work	Neural Response Interpretation through the Lens of Critical Pathways	Arxiv	PyTorch- Pathway Grad PyTorch - ROAR
Inaugment	InAugment: Improving Classifiers via Internal Augmentation	Arxiv	Code yet to be updated
Gradual Grad CAM	Enhancing Deep Neural Network Saliency Visualizations with Gradual Extrapolation	Arxiv	PyTorch
A-FMI	A-FMI: LEARNING ATTRIBUTIONS FROM DEEP NETWORKS VIA FEATURE MAP IMPORTANCE	Arxiv
Trust - Regression	To Trust or Not to Trust a Regressor: Estimating and Explaining Trustworthiness of Regression Predictions	AAAI 2021	sklearn
Concept based explanations - study	IS DISENTANGLEMENT ALL YOU NEED? COMPARING CONCEPT-BASED & DISENTANGLEMENT APPROACHES	ICLR 2021 workshop	tensorflow 2.3
Faithful attribution	Mutual Information Preserving Back-propagation: Learn to Invert for Faithful Attribution	Arxiv
Counterfactual explanation	Counterfactual attribute-based visual explanations for classification	Springer
User based explanations	That's (not) the output I expected!” On the role of end user expectations in creating explanations of AI systems	AIJ
Human understandable concept based explanations	Towards Human-Understandable Visual Explanations: Imperceptible High-frequency Cues Can Better Be Removed	Arxiv
Improved attribution	Improving Attribution Methods by Learning Submodular Functions	Arxiv
SHAP tractability	On the Complexity of SHAP-Score-Based Explanations: Tractability via Knowledge Compilation and Non-Approximability Results	Arxiv
SHAP explanation network	SHAPLEY EXPLANATION NETWORKS	ICLR 2021	PyTorch
Concept based dataset shift explanation	FAILING CONCEPTUALLY: CONCEPT-BASED EXPLANATIONS OF DATASET SHIFT	ICLR 2021 workshop	tensorflow 2
EbD	Towards Human-Understandable Visual Explanations: Imperceptible High-frequency Cues Can Better Be Removed	Arxiv
Evaluating CAM	Revisiting The Evaluation of Class Activation Mapping for Explainability: A Novel Metric and Experimental Analysis	Arxiv
EFC-CAM	Exclusive Feature Constrained Class Activation Mapping for Better Visual Explanation	IEEE
Causal Interpretation	Instance-wise Causal Feature Selection for Model Interpretation	Arxiv	PyTorch
Fairness in Learning	Learning to Learn to be Right for the Right Reasons	Arxiv
Feature attribution correctness	Do Feature Attribution Methods Correctly Attribute Features?	Arxiv	Code not yet updated
NICE	NICE: AN ALGORITHM FOR NEAREST INSTANCE COUNTERFACTUAL EXPLANATIONS	Arxiv	Own Python Package
SCG	A Peek Into the Reasoning of Neural Networks: Interpreting with Structural Visual Concepts	Arxiv
Visual Concepts	A Peek Into the Reasoning of Neural Networks: Interpreting with Structural Visual Concepts	Arxiv
This looks like that - drawback	This Looks Like That... Does it? Shortcomings of Latent Space Prototype Interpretability in Deep Networks	Arxiv	PyTorch
Exemplar based classification	Visualizing Association in Exemplar-Based Classification	ICASSP 2021
Correcting classification	CORRECTING CLASSIFICATION: A BAYESIAN FRAMEWORK USING EXPLANATION FEEDBACK TO IMPROVE CLASSIFICATION ABILITIES	Arxiv
Concept Bottleneck Networks	DO CONCEPT BOTTLENECK MODELS LEARN AS INTENDED?	ICLR workshop 2021
Sanity for saliency	Sanity Simulations for Saliency Methods	Arxiv
Concept based explanations	Cause and Effect: Concept-based Explanation of Neural Networks	Arxiv
CLIMEP	How to Explain Neural Networks: A perspective of data space division	Arxiv
Sufficient explanations	Probabilistic Sufficient Explanations	Arxiv	Empty Repository
SHAP baseline	Learning Baseline Values for Shapley Values	Arxiv
Explainable by Design	EXoN: EXplainable encoder Network	Arxiv	tensorflow 2.4.0	`explainable VAE`
Concept based explanations	Aligning Artificial Neural Networks and Ontologies towards Explainable AI	AAAI 2021
XAI via Bayesian teaching	ABSTRACTION, VALIDATION, AND GENERALIZATION FOR EXPLAINABLE ARTIFICIAL INTELLIGENCE	Arxiv
Explanation blind spots	DO NOT EXPLAIN WITHOUT CONTEXT: ADDRESSING THE BLIND SPOT OF MODEL EXPLANATIONS	Arxiv
BLA	Bounded logit attention: Learning to explain image classifiers	Arxiv	tensorflow	L2X++
Interpretability - mathematical model	The Definitions of Interpretability and Learning of Interpretable Models	Arxiv
Similar to our ICML workshop 2021 work	The effectiveness of feature attribution methods and its correlation with automatic evaluation scores	Arxiv
EDDA	EDDA: Explanation-driven Data Augmentation to Improve Model and Explanation Alignment	Arxiv
Relevant set explanations	Efficient Explanations With Relevant Sets	Arxiv
Model transfer	Making CNNs Interpretable by Building Dynamic Sequential Decision Forests with Top-down Hierarchy Learning	Arxiv
Model correction	Finding and Fixing Spurious Patterns with Explanations	Arxiv
Neuron graph communities	On the Evolution of Neuron Communities in a Deep Learning Architecture	Arxiv
Mid level features explanations	A general approach for Explanations in terms of Middle Level Features	Arxiv		see how different from MUSE by Hima Lakkaraju group
Concept based knowledge distillation	Towards Black-Box Explainability with Gaussian Discriminant Knowledge Distillation	CVPR 2021 workshop		compare and contrast with network dissection
CNN high frequency bias	Dissecting the High-Frequency Bias in Convolutional Neural Networks	CVPR 2021 workshop	Tensorflow
Explainable by design	Entropy-based Logic Explanations of Neural Networks	Arxiv	PyTorch	concept based
CALM	Keep CALM and Improve Visual Feature Attribution	Arxiv	PyTorch
Relevance CAM	Relevance-CAM: Your Model Already Knows Where to Look	CVPR 2021	PyTorch
S-LIME	S-LIME: Stabilized-LIME for Model Explanation	Arxiv	sklearn
Local + Global	Best of both worlds: local and global explanations with human-understandable concepts	Arxiv		Been Kim's group
Guided integrated gradients	Guided Integrated Gradients: an Adaptive Path Method for Removing Noise	CVPR 2021
Concept based	Meaningfully Explaining a Model’s Mistakes	Arxiv
Explainable by design	It’s FLAN time! Summing feature-wise latent representations for interpretability	Arxiv
SimAM	SimAM: A Simple, Parameter-Free Attention Module for Convolutional Neural Networks	ICML 2021	PyTorch
DANCE	DANCE: Enhancing saliency maps using decoys	ICML 2021	Tensorflow 1.x
EbD Concept formation	Explore Visual Concept Formation for Image Classification	ICML 2021	PyTorch
Explainable by design	Interpretable Compositional Convolutional Neural Networks	Arxiv
Attribution aggregation	Explaining Convolutional Neural Networks through Attribution-Based Input Sampling and Block-Wise Feature Aggregation	AAAI 2021 - pdf
Perturbation based activation	A Novel Visual Interpretability for Deep Neural Networks by Optimizing Activation Maps with Perturbation	AAAI 2021
Global explanations	Feature Synergy, Redundancy, and Independence in Global Model Explanations using SHAP Vector Decomposition	Arxiv	Github package
L2E	Learning to Explain: Generating Stable Explanations Fast	ACL 2021	PyTorch	NLE
Joint Shapley	Joint Shapley values: a measure of joint feature importance	Arxiv
Explainable by design	Align Yourself: Self-supervised Pre-training for Fine-grained Recognition via Saliency Alignment	Arxiv
Explainable by design	SONG: SELF-ORGANIZING NEURAL GRAPHS	Arxiv
Explainable by design	Designing Shapelets for Interpretable Data-Agnostic Classification	AIES 2021	sklearn	Interpretable block of time series extended to other data modalitites like image, text, tabular
Global explanations + Model correction	Where do Models go Wrong? Parameter-Space Saliency Maps for Explainability	Arxiv	PyTorch
HIL- Model correction	Human-in-the-loop Extraction of Interpretable Concepts in Deep Learning Models	Arxiv
Activation based Cause Analysis	Activation-Based Cause Analysis Method for Neural Networks	IEEE Access 2021
Local explanations	Leveraging Latent Features for Local Explanations	ACM SIGKDD 2021		Amit Dhurandhar group
Fairness	Adequate and fair explanations	Arxiv - Accepted in CD-MAKE 2021
Global explanations	Finding Representative Interpretations on Convolutional Neural Networks	ICCV 2021
Groupwise explanations	Learning Groupwise Explanations for Black-Box Models	IJCAI 2021	PyTorch
Mathematical	On Smoother Attributions using Neural Stochastic Differential Equations	IJCAI 2021
AGI	Explaining Deep Neural Network Models with Adversarial Gradient Integration	IJCAI 2021	PyTorch
Accountable attribution	Longitudinal Distance: Towards Accountable Instance Attribution	Arxiv	Tensorflow Keras
Global explanation	Understanding of Kernels in CNN Models by Suppressing Irrelevant Visual Features in Images	Arxiv
Concepts based - Explainable by design	Inducing Semantic Grouping of Latent Concepts for Explanations: An Ante-Hoc Approach	Arxiv		IITH Vineeth sir group
Explainable by design	This looks more like that: Enhancing Self-Explaining Models by Prototypical Relevance Propagation	Arxiv
MIL	ProtoMIL: Multiple Instance Learning with Prototypical Parts for Fine-Grained Interpretability	Arxiv
Concept based explanations	Instance-wise or Class-wise? A Tale of Neighbor Shapley for Concept-based Explanation	Arxiv
Counterfactual explanation + Theory of Mind	CX-ToM: Counterfactual Explanations with Theory-of-Mind for Enhancing Human Trust in Image Recognition Models	Arxiv
Evaluation metric	Counterfactual Evaluation for Explainable AI	Arxiv
CIM - FSC	CIM: Class-Irrelevant Mapping for Few-Shot Classification	Arxiv
Causal Concepts	Unsupervised Causal Binary Concepts Discovery with VAE for Black-box Model Explanation	Arxiv
ECE	Ensemble of Counterfactual Explainers	Paper	Code - seems hybrid of tf and torch
Structured Explanations	From Heatmaps to Structured Explanations of Image Classifiers	Arxiv
XAI metric	An Objective Metric for Explainable AI - How and Why to Estimate the Degree of Explainability	Arxiv
DisCERN	DisCERN:Discovering Counterfactual Explanations using Relevance Features from Neighbourhoods	Arxiv
PSEM	Towards Better Model Understanding with Path-Sufficient Explanations	Arxiv		Amit Dhurandhar sir group
Evaluation traps	The Logic Traps in Evaluating Post-hoc Interpretations	Arxiv
Interactive explanations	Explainability Requires Interactivity	Arxiv	PyTorch
CounterNet	CounterNet: End-to-End Training of Counterfactual Aware Predictions	Arxiv	PyTorch
Evaluation metric - Concept based explanation	Detection Accuracy for Evaluating Compositional Explanations of Units	Arxiv
Explanation - Uncertainity	Effects of Uncertainty on the Quality of Feature Importance Explanations	Arxiv
Survey Paper	TOWARDS USER-CENTRIC EXPLANATIONS FOR EXPLAINABLE MODELS: A REVIEW	JISTM Journal Paper
Feature attribution	The Struggles and Subjectivity of Feature-Based Explanations: Shapley Values vs. Minimal Sufficient Subsets	AAAI 2021 workshop
Contextual explanation	Context-based image explanations for deep neural networks	Image and Vision Computing Journal
Causal + Counterfactual	Counterfactual Instances Explain Little	Arxiv
Case based Posthoc	Explaining Deep Learning using examples: Optimal feature weighting methods for twin systems using post-hoc, explanation-by-example in XAI	Elsevier
Debugging gray box model	Toward a Unified Framework for Debugging Gray-box Models	Arxiv
Explainable by design	Optimising for Interpretability: Convolutional Dynamic Alignment Networks	Arxiv
XAI negative effect	Explainability Pitfalls: Beyond Dark Patterns in Explainable AI	Arxiv
Evaluate attributions	WHO EXPLAINS THE EXPLANATION? QUANTITATIVELY ASSESSING FEATURE ATTRIBUTION METHODS	Arxiv
Counterfactual explanations	Designing Counterfactual Generators using Deep Model Inversion	Arxiv
Model correction using explanation	Consistent Explanations by Contrastive Learning	Arxiv
Visualize feature maps	Visualizing Feature Maps for Model Selection in Convolutional Neural Networks	ICCV 2021 Workshop	Tensorflow 1.15
SPS	Stochastic Partial Swap: Enhanced Model Generalization and Interpretability for Fine-grained Recognition	ICCV 2021	PyTorch
DMBP	Generating Attribution Maps with Disentangled Masked Backpropagation	ICCV 2021
Better CAM	Towards Better Explanations of Class Activation Mapping	ICCV 2021
LEG	Statistically Consistent Saliency Estimation	ICCV 2021	Keras
IBA	Fine-Grained Neural Network Explanation by Identifying Input Features with Predictive Information	NeurIPS 2021	PyTorch
Looks similar to This Looks Like That	Interpretable Image Recognition by Constructing Transparent Embedding Space	ICCV 2021	Code not yet publicly released
Causal Imagenet	CAUSAL IMAGENET: HOW TO DISCOVER SPURIOUS FEATURES IN DEEP LEARNING?	Arxiv
Model correction	Logic Constraints to Feature Importances	Arxiv
Receptive field Misalignment CAM	On the Receptive Field Misalignment in CAM-based Visual Explanations	Pattern recognition Letters	PyTorch
Simplex	Explaining Latent Representations with a Corpus of Examples	Arxiv	PyTorch
Sanity checks	Revisiting Sanity Checks for Saliency Maps	Arxiv - NeurIPS 2021 workshop
Model correction	Debugging the Internals of Convolutional Networks	PDF
SITE	Self-Interpretable Model with Transformation Equivariant Interpretation	Arxiv	Accepted at NeurIPS 2021	EbD
Influential examples	Revisiting Methods for Finding Influential Examples	Arxiv
SOBOL	Look at the Variance! Efficient Black-box Explanations with Sobol-based Sensitivity Analysis	NeurIPS 2021	Tensorflow and PyTorch
Feature vectors	Beyond Importance Scores: Interpreting Tabular ML by Visualizing Feature Semantics	Arxiv		global interpretability
OOD in explainability	The Out-of-Distribution Problem in Explainability and Search Methods for Feature Importance Explanations	NeurIPS 2021	sklearn
RPS LJE	Representer Point Selection via Local Jacobian Expansion for Post-hoc Classifier Explanation of Deep Neural Networks and Ensemble Models	NeurIPS 2021	PyTorch
Model correction	Editing a Classifier by Rewriting Its Prediction Rules	NeurIPS 2021	Code
suppressor variable litmus test	Scrutinizing XAI using linear ground-truth data with suppressor variables	Arxiv
Explainable knowledge distillation	Learning Interpretation with Explainable Knowledge Distillation	Arxiv
STEEX	STEEX: Steering Counterfactual Explanations with Semantics	Arxiv	Code
Binary counterfactual explanation	Counterfactual Explanations via Latent Space Projection and Interpolation	Arxiv
ECLAIRE	Efficient Decompositional Rule Extraction for Deep Neural Networks	Arxiv	R
CartoonX	Cartoon Explanations of Image Classifiers	Researchgate
concept based explanation	Explanations in terms of Hierarchically organised Middle Level Features	Paper		see how close to MACE and PACE
Concept ball	Ontology-based 𝑛-ball Concept Embeddings Informing Few-shot Image Classification	Paper
SPARROW	SPARROW: Semantically Coherent Prototypes for Image Classification	BMVC 2021
XAI evaluation criteria	Objective criteria for explanations of machine learning models	Paper
Code inversion with human perception	EXPLORING ALIGNMENT OF REPRESENTATIONS WITH HUMAN PERCEPTION	Arxiv
Deformable ProtoPNet	Deformable ProtoPNet: An Interpretable Image Classifier Using Deformable Prototypes	Arxiv
ICSN	Interactive Disentanglement: Learning Concepts by Interacting with their Prototype Representations	Arxiv
HIVE	HIVE: Evaluating the Human Interpretability of Visual Explanations	Arxiv	Project Page
Jitter CAM	Jitter-CAM: Improving the Spatial Resolution of CAM-Based Explanations	BMVC 2021	PyTorch
Interpreting last layer	dentifying Class Specific Filters with L1 Norm Frequency Histograms in Deep CNNs	Arxiv
FCP	Forward Composition Propagation for Explainable Neural Reasoning	Arxiv
Protopool	Interpretable Image Classification with Differentiable Prototypes Assignment	Arxiv
PRELIM	Pedagogical Rule Extraction for Learning Interpretable Models	Arxiv
Fair correction vectors	FAIR INTERPRETABLE LEARNING VIA CORRECTION VECTORS	ICLR 2021
Smooth LRP	SmoothLRP: Smoothing LRP by Averaging over Stochastic Input Variations	ESANN 2021
Causal CAM	EXTRACTING CAUSAL VISUAL FEATURES FOR LIMITED LABEL CLASSIFICATION	ICIP 2021

2022 Papers

Title	Paper Title	Source Link	Code	Tags
SNI	Semantic Network Interpretation	WACV 2022
F-CAM	F-CAM: Full Resolution Class Activation Maps via Guided Parametric Upscaling	WACV 2022	PyTorch
PCACE	PCACE: A Statistical Approach to Ranking Neurons for CNN Interpretability	Arxiv
Evaluating Attribution methods	Evaluating Attribution Methods in Machine Learning Interpretability	IEEE International Conference on Big Data
X-decision making	Explainable Decision Making with Lean and Argumentative Explanations	Arxiv
Include domain knowledge to neural network	A review of some techniques for inclusion of domain‑knowledge into deep neural networks	Nature
CNN Hierarchical Decomposition	Deeply Explain CNN via Hierarchical Decomposition	Arxiv
Explanatory learning	EXPLANATORY LEARNING: BEYOND EMPIRICISM IN NEURAL NETWORKS	Arxiv
Conceptor CAM	Conceptor Learning for Class Activation Mapping	IEEE-TIP
Classifier orthogonalization	CONTROLLING DIRECTIONS ORTHOGONAL TO A CLASSIFIER	ICLR 2022	PyTorch
Attention not explanation	Attention cannot be an Explanation	Arxiv
CNN sensitivity analysis	A Comprehensive Study of Image Classification Model Sensitivity to Foregrounds, Backgrounds, and Visual Attributes	Arxiv
Trusting extrapolation	To what extent should we trust AI models when they extrapolate?	Arxiv
LAP	LAP: An Attention-Based Module for Faithful Interpretation and Knowledge Injection in Convolutional Neural Networks	Arxiv		concept based explanations
Saliency map evaluation metrics	Metrics for saliency map evaluation of deep learning explanation methods	Arxiv
LINEX	Locally Invariant Explanations: Towards Stable and Unidirectional Explanations through Local Invariant Learning	Arxiv
ROAD	Evaluating Feature Attribution: An Information-Theoretic Perspective	Arxiv	PyTorch
CBM-AUC	Concept Bottleneck Model with Additional Unsupervised Concepts	Arxiv
Explainability as dialogue	Rethinking Explainability as a Dialogue: A Practitioner’s Perspective	Arxiv
IAA	Aligning Eyes between Humans and Deep Neural Network through Interactive Attention Alignment	Arxiv
Plug in	A Novel Plug-in Module for Fine-Grained Visual Classification	Arxiv	PyTorch
Hierarchical concepts	Cause and Effect: Hierarchical Concept-based Explanation of Neural Networks	Arxiv
Model correction by design	LEARNING ROBUST CONVOLUTIONAL NEURAL NETWORKS WITH RELEVANT FEATURE FOCUSING VIA EXPLANATIONS	Arxiv
Concept discovery	Discovering Concepts in Learned Representations using Statistical Inference and Interactive Visualization	Arxiv
Rare spurious correlation	Understanding Rare Spurious Correlations in Neural Networks	Arxiv	PyTorch
Causal	Matching Learned Causal Effects of Neural Networks with Domain Priors	Arxiv
PYLON	Improved image classification explainability with high accuracy heatmaps	iScience Journal
Causal counterfactual	REALISTIC COUNTERFACTUAL EXPLANATIONS BY LEARNED RELATIONS	Arxiv
Argumentative Causal explanation	Forging Argumentative Explanations from Causal Models	Paper
EVA	Don’t Lie to Me! Robust and Efficient Explainability with Verified Perturbation Analysis	Arxiv
Conceptual modelling	ConceptSuperimposition: Using Conceptual Modeling Method for Explainable AI	Paper
SIDU	Visual Explanation of Black-Box Model : Similarity Difference and Uniqueness (SIDU) Method	Pattern Recognition Journal	Tensorflow 2.x
Explainable representations	Explaining, Evaluating and Enhancing Neural Networks’ Learned Representations	Arxiv
XAI Overview	Explanatory Paradigms in Neural Networks	Arxiv
Evaluating attribution methods	Evaluating Feature Attribution Methods in the Image Domain	Arxiv	PyTorch
Prototype vector + perturbation	The Need for Empirical Evaluation of Explanation Quality	Arxiv
ADVISE	ADVISE: ADaptive Feature Relevance and VISual Explanations for Convolutional Neural Networks	Arxiv	Matlab
Improving Grad CAM	Improving the Interpretability of GradCAMs in Deep Classification Networks	Science Direct
Explainable by design	Interpretable part-whole hierarchies and conceptual-semantic relationships in neural networks	CVPR 2022	PyTorch
CAMP	Do Explanations Explain? Model Knows Best	Arxiv	PyTorch
Attribution stability	RETHINKING STABILITY FOR ATTRIBUTION-BASED EXPLANATIONS	Arxiv
SSSCD	Sparse Subspace Clustering for Concept Discovery (SSCCD)	Arxiv
Model improvement	Beyond Explaining: Opportunities and Challenges of XAI-Based Model Improvement	Arxiv
Causal explanations	Trying to Outrun Causality in Machine Learning: Limitations of Model Explainability Techniques for Identifying Predictive Variables	Arxiv	sklearn
Causal explanations	Diffusion Causal Models for Counterfactual Estimation	Arxiv
Causal inference influence functions	A Free Lunch with Influence Functions? Improving Neural Network Estimates with Concepts from Semiparametric Statistics	Arxiv	PyTorch
Causal discovery	Causal discovery for observational sciences using supervised machine learning	Arxiv
Causal DA	Causal Domain Adaptation with Copula Entropy based Conditional Independence Test	Arxiv
Causal experimental design	Interventions, Where and How? Experimental Design for Causal Models at Scale	Arxiv		seems ICML format
Causal discovery	SCORE MATCHING ENABLES CAUSAL DISCOVERY OF NONLINEAR ADDITIVE NOISE MODELS	Arxiv
Causal Explanation - Cynthia Rudin	WHY INTERPRETABLE CAUSAL INFERENCE IS IMPORTANT FOR HIGH-STAKES DECISION MAKING FOR CRITICALLY ILL PATIENTS AND HOW TO DO IT	Arxiv
Semantically consistent counterfactuals	Making Heads or Tails: Towards Semantically Consistent Visual Counterfactuals	Arxiv
Posthoc global hypersphere	Post-hoc Global Explanation using Hypersphere Sets	ICAART 2022
CapsNet explanation	Investigation of Capsule Networks Regarding their Potential of Explainability and Image Rankings	ICAART 2022
XAI evaluation	A Unified Study of Machine Learning Explanation Evaluation Metrics	Arxiv
Concept based counterfactual explanations	DISSECT: Disentangled Simultaneous Explanations via Concept Traversals	ICLR 2022	tensorflow 1.12	Been Kim's group
concept evolution	ConceptEvo: Interpreting Concept Evolution in Deep Learning Training	Arxiv
Poly-CAM	Backward recursive Class Activation Map refinement for high resolution saliency map	Paper
Interactive Concept explanation	ConceptExplainer: Interactive Explanation for Deep Neural Networks from a Concept Perspective	Arxiv
Quasi ProtoPNet	Think positive: An interpretable neural network for image recognition	Neural Networks Journal
TAM	VISUALIZING DEEP NEURAL NETWORKS WITH TOPOGRAPHIC ACTIVATION MAPS	Arxiv
S-XAI	Semantic interpretation for convolutional neural networks: What makes a cat a cat?	Arxiv
See through DNN	Perception Visualization: Seeing Through the Eyes of a DNN	Arxiv
IOM	Understanding CNNs from excitations	Arxiv
KICE	Integrating Prior Knowledge in Post-hoc Explanations	Arxiv