Debiasing concepts | Debiasing Concept Bottleneck Models with Instrumental Variables | ICLR 2021 submissions page - Accepted as Poster | | causality |
Prototype Trajectory | Interpretable Sequence Classification Via Prototype Trajectory | ICLR 2021 submissions page | | this looks like that styled RNN |
Shapley dependence assumption | Shapley explainability on the data manifold | ICLR 2021 submissions page | | |
High dimension Shapley | Human-interpretable model explainability on high-dimensional data | ICLR 2021 submissions page | | |
L2x like paper | A Learning Theoretic Perspective on Local Explainability | ICLR 2021 submissions page | | |
Evaluation | Evaluation of Similarity-based Explanations | ICLR 2021 submissions page | | like adebayo paper for this looks like that styled methods |
Model correction | Defuse: Debugging Classifiers Through Distilling Unrestricted Adversarial Examples | ICLR 2021 submissions page | | |
Subspace explanation | Constraint-Driven Explanations of Black-Box ML Models | ICLR 2021 submissions page | | to see how close to MUSE by Hima Lakkaraju 2019 |
Catastrophic forgetting | Remembering for the Right Reasons: Explanations Reduce Catastrophic Forgetting | ICLR 2021 submissions page | Code available in their Supplementary zip file | |
Non trivial counterfactual explanations | Beyond Trivial Counterfactual Generations with Diverse Valuable Explanations | ICLR 2021 submissions page | | |
Explainable by Design | Interpretability Through Invertibility: A Deep Convolutional Network With Ideal Counterfactuals And Isosurfaces | ICLR 2021 submissions page | | |
Gradient attribution | Rethinking the Role of Gradient-based Attribution Methods for Model Interpretability | ICLR 2021 submissions page | | looks like extension of Sixt et al paper |
Mask based Explainable by Design | Investigating and Simplifying Masking-based Saliency Methods for Model Interpretability | ICLR 2021 submissions page | | |
NBDT - Explainable by Design | NBDT: Neural-Backed Decision Tree | ICLR 2021 submissions page | | |
Variational Saliency Maps | Variational saliency maps for explaining model's behavior | ICLR 2021 submissions page | | |
Network dissection with coherency or stability metric | Importance and Coherence: Methods for Evaluating Modularity in Neural Networks | ICLR 2021 submissions page | | |
Modularity | Are Neural Nets Modular? Inspecting Functional Modularity Through Differentiable Weight Masks | ICLR 2021 submissions page | Code made anonymous for review, link given in paper | |
Explainable by design | A self-explanatory method for the black problem on discrimination part of CNN | ICLR 2021 submissions page | | seems concepts of game theory applied |
Attention not Explanation | Why is Attention Not So Interpretable? | ICLR 2021 submissions page | | |
Ablation Saliency | Ablation Path Saliency | ICLR 2021 submissions page | | |
Explainable Outlier Detection | Explainable Deep One-Class Classification | ICLR 2021 submissions page | | |
XAI without approximation | Explainable AI Wthout Interpretable Model | Arxiv | | |
Learning theoretic Local Interpretability | A LEARNING THEORETIC PERSPECTIVE ON LOCAL EXPLAINABILITY | Arxiv | | |
GANMEX | GANMEX: ONE-VS-ONE ATTRIBUTIONS USING GAN-BASED MODEL EXPLAINABILITY | Arxiv | | |
Evaluating Local Explanations | Evaluating local explanation methods on ground truth | Artificial Intelligence Journal Elsevier | sklearn | |
Structured Attention Graphs | Structured Attention Graphs for Understanding Deep Image Classifications | AAAI 2021 | PyTorch | see how close to MACE |
Ground truth explanations | Data Representing Ground-Truth Explanations to Evaluate XAI Methods | AAAI 2021 | sklearn | trained models available in their github repository |
AGF | Visualization of Supervised and Self-Supervised Neural Networks via Attribution Guided Factorization | AAAI 2021 | PyTorch | |
RSP | Interpreting Deep Neural Networks with Relative Sectional Propagation by Analyzing Comparative Gradients and Hostile Activations | AAAI 2021 | | |
HyDRA | HYDRA: Hypergradient Data Relevance Analysis for Interpreting Deep Neural Networks | AAAI 2021 | PyTorch | |
SWAG | SWAG: Superpixels Weighted by Average Gradients for Explanations of CNNs | WACV 2021 | | |
FastIF | FASTIF: Scalable Influence Functions for Efficient Model Interpretation and Debugging | Arxiv | PyTorch | |
EVET | EVET: Enhancing Visual Explanations of Deep Neural Networks Using Image Transformations | WACV 2021 | | |
Local Attribution Baselines | On Baselines for Local Feature Attributions | AAAI 2021 | PyTorch | |
Differentiated Explanations | Differentiated Explanation of Deep Neural Networks with Skewed Distributions | IEEE - TPAMI journal | PyTorch | |
Human game based survey | Explainable AI and Adoption of Algorithmic Advisors: an Experimental Study | Arxiv | | |
Explainable by design | Learning Semantically Meaningful Features for Interpretable Classifications | Arxiv | | |
Expred | Explain and Predict, and then Predict again | ACM WSDM 2021 | PyTorch | |
Progressive Interpretation | An Information-theoretic Progressive Framework for Interpretation | Arxiv | PyTorch | |
UCAM | Uncertainty Class Activation Map (U-CAM) using Gradient Certainty method | IEEE - TIP | Project Page | PyTorch |
progressive GAN explainability- smiling dataset- ICLR 2020 group | Explaining the Black-box Smoothly - A Counterfactual Approach | Arxiv | | |
Head pasted in another image - experimented | WHAT DO DEEP NETS LEARN? CLASS-WISE PATTERNS REVEALED IN THE INPUT SPACE | Arxiv | | |
Model correction | ExplOrs Explanation Oracles and the architecture of explainability | Paper | | |
Explanations - Knowledge Representation | A Basic Framework for Explanations in Argumentation | IEEE | | |
Eigen CAM | Eigen-CAM: Visual Explanations for Deep Convolutional Neural Networks | Springer | | |
Evaluation of Posthoc | How can I choose an explainer? An Application-grounded Evaluation of Post-hoc Explanations | ACM | | |
GLocalX | GLocalX - From Local to Global Explanations of Black Box AI Models | Arxiv | | |
Consistent Interpretations | Explainable Models with Consistent Interpretations | AAAI 2021 | | |
SIDU | Introducing and assessing the explainable AI (XAI) method: SIDU | Arxiv | | |
cites This looks like that | Explaining black-box classifiers using post-hoc explanations-by-example: The effect of explanations and error-rates in XAI user studies | AIJ | | |
i-Algebra | i-Algebra: Towards Interactive Interpretability of Deep Neural Networks | AAAI 2021 | | |
Shape texture bias | SHAPE OR TEXTURE: UNDERSTANDING DISCRIMINATIVE FEATURES IN CNNS | ICLR 2021 | | |
Class agnostic features | THE MIND’S EYE: VISUALIZING CLASS-AGNOSTIC FEATURES OF CNNS | Arxiv | | |
IBEX | A Multi-layered Approach for Tailored Black-box Explanations | Paper | Code | |
Relevant explanations | Learning Relevant Explanations | Paper | | |
Guided Zoom | Guided Zoom: Zooming into Network Evidence to Refine Fine-grained Model Decisions | IEEE | | |
XAI survey | A Survey on Understanding, Visualizations, and Explanation of Deep Neural Networks | Arxiv | | |
Pattern theory | Convolutional Neural Network Interpretability with General Pattern Theory | Arxiv | PyTorch | |
Gaussian Process based explanations | Bandits for Learning to Explain from Explanations | AAAI 2021 | sklearn | |
LIFT CAM | LIFT-CAM: Towards Better Explanations for Class Activation Mapping | Arxiv | | |
ObAIEx | Right for the Right Reasons: Making Image Classification Intuitively Explainable | Paper | tensorflow | |
VAE based explainer | Combining an Autoencoder and a Variational Autoencoder for Explaining the Machine Learning Model Predictions | IEEE | | |
Segmentation based explanation | Deep Co-Attention Network for Multi-View Subspace Learning | Arxiv | PyTorch | |
Integrated CAM | INTEGRATED GRAD-CAM: SENSITIVITY-AWARE VISUAL EXPLANATION OF DEEP CONVOLUTIONAL NETWORKS VIA INTEGRATED GRADIENT-BASED SCORING | ICASSP 2021 | PyTorch | |
Human study | VitrAI - Applying Explainable AI in the Real World | Arxiv | | |
Attribution Mask | Attribution Mask: Filtering Out Irrelevant Features By Recursively Focusing Attention on Inputs of DNNs | Arxiv | PyTorch | |
LIME faithfulness | What does LIME really see in images? | Arxiv | Tensorflow 1.x | |
Assess model reliability | Intuitively Assessing ML Model Reliability through Example-Based Explanations and Editing Model Inputs | Arxiv | | |
Perturbation + Gradient unification | Towards the Unification and Robustness of Perturbation and Gradient Based Explanations | Arxiv | | hima lakkaraju |
Gradients faithful? | Do Input Gradients Highlight Discriminative Features? | Arxiv | PyTorch | |
Untrustworthy predictions | Identifying Untrustworthy Predictions in Neural Networks by Geometric Gradient Analysis | Arxiv | | |
Explaining misclassification | Explaining Inaccurate Predictions of Models through k-Nearest Neighbors | Paper | | cites Oscar Li AAAI 2018 prototypes paper |
Explanations inside predictions | Have We Learned to Explain?: How Interpretability Methods Can Learn to Encode Predictions in their Interpretations | AISTATS 2021 | | |
Layerwise interpretation | LAYER-WISE INTERPRETATION OF DEEP NEURAL NETWORKS USING IDENTITY INITIALIZATION | Arxiv | | |
Visualizing Rule Sets | Visualizing Rule Sets: Exploration and Validation of a Design Space | Arxiv | PyTorch | |
Human experiments | Are Explanations Helpful? A Comparative Study of the Effects of Explanations in AI-Assisted Decision-Making | IUI 2021 | | |
Attention fine-grained classification | Interpretable Attention Guided Network for Fine-grained Visual Classification | Arxiv | | |
Concept construction | Explaining Classifiers by Constructing Familiar Concepts | Paper | PyTorch | |
EbD | Human-Understandable Decision Making for Visual Recognition | Arxiv | | |
Bridging XAI algorithm , Human needs | Towards Connecting Use Cases and Methods in Interpretable Machine Learning | Arxiv | | |
Generative trustworthy classifiers | Generative Classifiers as a Basis for Trustworthy Image Classification | Paper | Github | |
Counterfactual explanations | Generating Interpretable Counterfactual Explanations By Implicit Minimisation of Epistemic and Aleatoric Uncertainties | AISTATS 2021 | PyTorch | |
Role categorization of CNN units | Quantitative Effectiveness Assessment and Role Categorization of Individual Units in Convolutional Neural Networks | ICML 2021 | | |
Non-trivial counterfactual explanations | Beyond Trivial Counterfactual Explanations with Diverse Valuable Explanations | Arxiv | | |
NP-ProtoPNet | These do not Look Like Those: An Interpretable Deep Learning Model for Image Recognition | IEEE | | |
Correcting neural networks based on explanations | Refining Neural Networks with Compositional Explanations | Arxiv | Code link given in paper, but page not found | |
Contrastive reasoning | Contrastive Reasoning in Neural Networks | Arxiv | | |
Concept based | Intersection Regularization for Extracting Semantic Attributes | Arxiv | | |
Boundary explanations | Boundary Attributions Provide Normal (Vector) Explanations | Arxiv | PyTorch | |
Generative Counterfactuals | ECINN: Efficient Counterfactuals from Invertible Neural Networks | Arxiv | | |
ICE | Invertible Concept-based Explanations for CNN Models with Non-negative Concept Activation Vectors | AAAI 2021 | | |
Group CAM | Group-CAM: Group Score-Weighted Visual Explanations for Deep Convolutional Networks | Arxiv | PyTorch | |
HMM interpretability | Towards interpretability of Mixtures of Hidden Markov Models | AAAI 2021 | sklearn | |
Empirical Explainers | Efficient Explanations from Empirical Explainers | Arxiv | PyTorch | |
FixNorm | FIXNORM: DISSECTING WEIGHT DECAY FOR TRAINING DEEP NEURAL NETWORKS | Arxiv | | |
CoDA-Net | Convolutional Dynamic Alignment Networks for Interpretable Classifications | CVPR 2021 | Code link given in paper. Repository not yet created | |
Like Dr. Chandru sir's (IITPKD) XAI work | Neural Response Interpretation through the Lens of Critical Pathways | Arxiv | PyTorch- Pathway GradPyTorch - ROAR | |
Inaugment | InAugment: Improving Classifiers via Internal Augmentation | Arxiv | Code yet to be updated | |
Gradual Grad CAM | Enhancing Deep Neural Network Saliency Visualizations with Gradual Extrapolation | Arxiv | PyTorch | |
A-FMI | A-FMI: LEARNING ATTRIBUTIONS FROM DEEP NETWORKS VIA FEATURE MAP IMPORTANCE | Arxiv | | |
Trust - Regression | To Trust or Not to Trust a Regressor: Estimating and Explaining Trustworthiness of Regression Predictions | AAAI 2021 | sklearn | |
Concept based explanations - study | IS DISENTANGLEMENT ALL YOU NEED? COMPARING CONCEPT-BASED & DISENTANGLEMENT APPROACHES | ICLR 2021 workshop | tensorflow 2.3 | |
Faithful attribution | Mutual Information Preserving Back-propagation: Learn to Invert for Faithful Attribution | Arxiv | | |
Counterfactual explanation | Counterfactual attribute-based visual explanations for classification | Springer | | |
User based explanations | That's (not) the output I expected!” On the role of end user expectations in creating explanations of AI systems | AIJ | | |
Human understandable concept based explanations | Towards Human-Understandable Visual Explanations: Imperceptible High-frequency Cues Can Better Be Removed | Arxiv | | |
Improved attribution | Improving Attribution Methods by Learning Submodular Functions | Arxiv | | |
SHAP tractability | On the Complexity of SHAP-Score-Based Explanations: Tractability via Knowledge Compilation and Non-Approximability Results | Arxiv | | |
SHAP explanation network | SHAPLEY EXPLANATION NETWORKS | ICLR 2021 | PyTorch | |
Concept based dataset shift explanation | FAILING CONCEPTUALLY: CONCEPT-BASED EXPLANATIONS OF DATASET SHIFT | ICLR 2021 workshop | tensorflow 2 | |
EbD | Towards Human-Understandable Visual Explanations: Imperceptible High-frequency Cues Can Better Be Removed | Arxiv | | |
Evaluating CAM | Revisiting The Evaluation of Class Activation Mapping for Explainability: A Novel Metric and Experimental Analysis | Arxiv | | |
EFC-CAM | Exclusive Feature Constrained Class Activation Mapping for Better Visual Explanation | IEEE | | |
Causal Interpretation | Instance-wise Causal Feature Selection for Model Interpretation | Arxiv | PyTorch | |
Fairness in Learning | Learning to Learn to be Right for the Right Reasons | Arxiv | | |
Feature attribution correctness | Do Feature Attribution Methods Correctly Attribute Features? | Arxiv | Code not yet updated | |
NICE | NICE: AN ALGORITHM FOR NEAREST INSTANCE COUNTERFACTUAL EXPLANATIONS | Arxiv | Own Python Package | |
SCG | A Peek Into the Reasoning of Neural Networks: Interpreting with Structural Visual Concepts | Arxiv | | |
Visual Concepts | A Peek Into the Reasoning of Neural Networks: Interpreting with Structural Visual Concepts | Arxiv | | |
This looks like that - drawback | This Looks Like That... Does it? Shortcomings of Latent Space Prototype Interpretability in Deep Networks | Arxiv | PyTorch | |
Exemplar based classification | Visualizing Association in Exemplar-Based Classification | ICASSP 2021 | | |
Correcting classification | CORRECTING CLASSIFICATION: A BAYESIAN FRAMEWORK USING EXPLANATION FEEDBACK TO IMPROVE CLASSIFICATION ABILITIES | Arxiv | | |
Concept Bottleneck Networks | DO CONCEPT BOTTLENECK MODELS LEARN AS INTENDED? | ICLR workshop 2021 | | |
Sanity for saliency | Sanity Simulations for Saliency Methods | Arxiv | | |
Concept based explanations | Cause and Effect: Concept-based Explanation of Neural Networks | Arxiv | | |
CLIMEP | How to Explain Neural Networks: A perspective of data space division | Arxiv | | |
Sufficient explanations | Probabilistic Sufficient Explanations | Arxiv | Empty Repository | |
SHAP baseline | Learning Baseline Values for Shapley Values | Arxiv | | |
Explainable by Design | EXoN: EXplainable encoder Network | Arxiv | tensorflow 2.4.0 | explainable VAE |
Concept based explanations | Aligning Artificial Neural Networks and Ontologies towards Explainable AI | AAAI 2021 | | |
XAI via Bayesian teaching | ABSTRACTION, VALIDATION, AND GENERALIZATION FOR EXPLAINABLE ARTIFICIAL INTELLIGENCE | Arxiv | | |
Explanation blind spots | DO NOT EXPLAIN WITHOUT CONTEXT: ADDRESSING THE BLIND SPOT OF MODEL EXPLANATIONS | Arxiv | | |
BLA | Bounded logit attention: Learning to explain image classifiers | Arxiv | tensorflow | L2X++ |
Interpretability - mathematical model | The Definitions of Interpretability and Learning of Interpretable Models | Arxiv | | |
Similar to our ICML workshop 2021 work | The effectiveness of feature attribution methods and its correlation with automatic evaluation scores | Arxiv | | |
EDDA | EDDA: Explanation-driven Data Augmentation to Improve Model and Explanation Alignment | Arxiv | | |
Relevant set explanations | Efficient Explanations With Relevant Sets | Arxiv | | |
Model transfer | Making CNNs Interpretable by Building Dynamic Sequential Decision Forests with Top-down Hierarchy Learning | Arxiv | | |
Model correction | Finding and Fixing Spurious Patterns with Explanations | Arxiv | | |
Neuron graph communities | On the Evolution of Neuron Communities in a Deep Learning Architecture | Arxiv | | |
Mid level features explanations | A general approach for Explanations in terms of Middle Level Features | Arxiv | | see how different from MUSE by Hima Lakkaraju group |
Concept based knowledge distillation | Towards Black-Box Explainability with Gaussian Discriminant Knowledge Distillation | CVPR 2021 workshop | | compare and contrast with network dissection |
CNN high frequency bias | Dissecting the High-Frequency Bias in Convolutional Neural Networks | CVPR 2021 workshop | Tensorflow | |
Explainable by design | Entropy-based Logic Explanations of Neural Networks | Arxiv | PyTorch | concept based |
CALM | Keep CALM and Improve Visual Feature Attribution | Arxiv | PyTorch | |
Relevance CAM | Relevance-CAM: Your Model Already Knows Where to Look | CVPR 2021 | PyTorch | |
S-LIME | S-LIME: Stabilized-LIME for Model Explanation | Arxiv | sklearn | |
Local + Global | Best of both worlds: local and global explanations with human-understandable concepts | Arxiv | | Been Kim's group |
Guided integrated gradients | Guided Integrated Gradients: an Adaptive Path Method for Removing Noise | CVPR 2021 | | |
Concept based | Meaningfully Explaining a Model’s Mistakes | Arxiv | | |
Explainable by design | It’s FLAN time! Summing feature-wise latent representations for interpretability | Arxiv | | |
SimAM | SimAM: A Simple, Parameter-Free Attention Module for Convolutional Neural Networks | ICML 2021 | PyTorch | |
DANCE | DANCE: Enhancing saliency maps using decoys | ICML 2021 | Tensorflow 1.x | |
EbD Concept formation | Explore Visual Concept Formation for Image Classification | ICML 2021 | PyTorch | |
Explainable by design | Interpretable Compositional Convolutional Neural Networks | Arxiv | | |
Attribution aggregation | Explaining Convolutional Neural Networks through Attribution-Based Input Sampling and Block-Wise Feature Aggregation | AAAI 2021 - pdf | | |
Perturbation based activation | A Novel Visual Interpretability for Deep Neural Networks by Optimizing Activation Maps with Perturbation | AAAI 2021 | | |
Global explanations | Feature Synergy, Redundancy, and Independence in Global Model Explanations using SHAP Vector Decomposition | Arxiv | Github package | |
L2E | Learning to Explain: Generating Stable Explanations Fast | ACL 2021 | PyTorch | NLE |
Joint Shapley | Joint Shapley values: a measure of joint feature importance | Arxiv | | |
Explainable by design | Align Yourself: Self-supervised Pre-training for Fine-grained Recognition via Saliency Alignment | Arxiv | | |
Explainable by design | SONG: SELF-ORGANIZING NEURAL GRAPHS | Arxiv | | |
Explainable by design | Designing Shapelets for Interpretable Data-Agnostic Classification | AIES 2021 | sklearn | Interpretable block of time series extended to other data modalitites like image, text, tabular |
Global explanations + Model correction | Where do Models go Wrong? Parameter-Space Saliency Maps for Explainability | Arxiv | PyTorch | |
HIL- Model correction | Human-in-the-loop Extraction of Interpretable Concepts in Deep Learning Models | Arxiv | | |
Activation based Cause Analysis | Activation-Based Cause Analysis Method for Neural Networks | IEEE Access 2021 | | |
Local explanations | Leveraging Latent Features for Local Explanations | ACM SIGKDD 2021 | | Amit Dhurandhar group |
Fairness | Adequate and fair explanations | Arxiv - Accepted in CD-MAKE 2021 | | |
Global explanations | Finding Representative Interpretations on Convolutional Neural Networks | ICCV 2021 | | |
Groupwise explanations | Learning Groupwise Explanations for Black-Box Models | IJCAI 2021 | PyTorch | |
Mathematical | On Smoother Attributions using Neural Stochastic Differential Equations | IJCAI 2021 | | |
AGI | Explaining Deep Neural Network Models with Adversarial Gradient Integration | IJCAI 2021 | PyTorch | |
Accountable attribution | Longitudinal Distance: Towards Accountable Instance Attribution | Arxiv | Tensorflow Keras | |
Global explanation | Understanding of Kernels in CNN Models by Suppressing Irrelevant Visual Features in Images | Arxiv | | |
Concepts based - Explainable by design | Inducing Semantic Grouping of Latent Concepts for Explanations: An Ante-Hoc Approach | Arxiv | | IITH Vineeth sir group |
Explainable by design | This looks more like that: Enhancing Self-Explaining Models by Prototypical Relevance Propagation | Arxiv | | |
MIL | ProtoMIL: Multiple Instance Learning with Prototypical Parts for Fine-Grained Interpretability | Arxiv | | |
Concept based explanations | Instance-wise or Class-wise? A Tale of Neighbor Shapley for Concept-based Explanation | Arxiv | | |
Counterfactual explanation + Theory of Mind | CX-ToM: Counterfactual Explanations with Theory-of-Mind for Enhancing Human Trust in Image Recognition Models | Arxiv | | |
Evaluation metric | Counterfactual Evaluation for Explainable AI | Arxiv | | |
CIM - FSC | CIM: Class-Irrelevant Mapping for Few-Shot Classification | Arxiv | | |
Causal Concepts | Unsupervised Causal Binary Concepts Discovery with VAE for Black-box Model Explanation | Arxiv | | |
ECE | Ensemble of Counterfactual Explainers | Paper | Code - seems hybrid of tf and torch | |
Structured Explanations | From Heatmaps to Structured Explanations of Image Classifiers | Arxiv | | |
XAI metric | An Objective Metric for Explainable AI - How and Why to Estimate the Degree of Explainability | Arxiv | | |
DisCERN | DisCERN:Discovering Counterfactual Explanations using Relevance Features from Neighbourhoods | Arxiv | | |
PSEM | Towards Better Model Understanding with Path-Sufficient Explanations | Arxiv | | Amit Dhurandhar sir group |
Evaluation traps | The Logic Traps in Evaluating Post-hoc Interpretations | Arxiv | | |
Interactive explanations | Explainability Requires Interactivity | Arxiv | PyTorch | |
CounterNet | CounterNet: End-to-End Training of Counterfactual Aware Predictions | Arxiv | PyTorch | |
Evaluation metric - Concept based explanation | Detection Accuracy for Evaluating Compositional Explanations of Units | Arxiv | | |
Explanation - Uncertainity | Effects of Uncertainty on the Quality of Feature Importance Explanations | Arxiv | | |
Survey Paper | TOWARDS USER-CENTRIC EXPLANATIONS FOR EXPLAINABLE MODELS: A REVIEW | JISTM Journal Paper | | |
Feature attribution | The Struggles and Subjectivity of Feature-Based Explanations: Shapley Values vs. Minimal Sufficient Subsets | AAAI 2021 workshop | | |
Contextual explanation | Context-based image explanations for deep neural networks | Image and Vision Computing Journal | | |
Causal + Counterfactual | Counterfactual Instances Explain Little | Arxiv | | |
Case based Posthoc | Explaining Deep Learning using examples: Optimal feature weighting methods for twin systems using post-hoc, explanation-by-example in XAI | Elsevier | | |
Debugging gray box model | Toward a Unified Framework for Debugging Gray-box Models | Arxiv | | |
Explainable by design | Optimising for Interpretability: Convolutional Dynamic Alignment Networks | Arxiv | | |
XAI negative effect | Explainability Pitfalls: Beyond Dark Patterns in Explainable AI | Arxiv | | |
Evaluate attributions | WHO EXPLAINS THE EXPLANATION? QUANTITATIVELY ASSESSING FEATURE ATTRIBUTION METHODS | Arxiv | | |
Counterfactual explanations | Designing Counterfactual Generators using Deep Model Inversion | Arxiv | | |
Model correction using explanation | Consistent Explanations by Contrastive Learning | Arxiv | | |
Visualize feature maps | Visualizing Feature Maps for Model Selection in Convolutional Neural Networks | ICCV 2021 Workshop | Tensorflow 1.15 | |
SPS | Stochastic Partial Swap: Enhanced Model Generalization and Interpretability for Fine-grained Recognition | ICCV 2021 | PyTorch | |
DMBP | Generating Attribution Maps with Disentangled Masked Backpropagation | ICCV 2021 | | |
Better CAM | Towards Better Explanations of Class Activation Mapping | ICCV 2021 | | |
LEG | Statistically Consistent Saliency Estimation | ICCV 2021 | Keras | |
IBA | Fine-Grained Neural Network Explanation by Identifying Input Features with Predictive Information | NeurIPS 2021 | PyTorch | |
Looks similar to This Looks Like That | Interpretable Image Recognition by Constructing Transparent Embedding Space | ICCV 2021 | Code not yet publicly released | |
Causal Imagenet | CAUSAL IMAGENET: HOW TO DISCOVER SPURIOUS FEATURES IN DEEP LEARNING? | Arxiv | | |
Model correction | Logic Constraints to Feature Importances | Arxiv | | |
Receptive field Misalignment CAM | On the Receptive Field Misalignment in CAM-based Visual Explanations | Pattern recognition Letters | PyTorch | |
Simplex | Explaining Latent Representations with a Corpus of Examples | Arxiv | PyTorch | |
Sanity checks | Revisiting Sanity Checks for Saliency Maps | Arxiv - NeurIPS 2021 workshop | | |
Model correction | Debugging the Internals of Convolutional Networks | PDF | | |
SITE | Self-Interpretable Model with Transformation Equivariant Interpretation | Arxiv | Accepted at NeurIPS 2021 | EbD |
Influential examples | Revisiting Methods for Finding Influential Examples | Arxiv | | |
SOBOL | Look at the Variance! Efficient Black-box Explanations with Sobol-based Sensitivity Analysis | NeurIPS 2021 | Tensorflow and PyTorch | |
Feature vectors | Beyond Importance Scores: Interpreting Tabular ML by Visualizing Feature Semantics | Arxiv | | global interpretability |
OOD in explainability | The Out-of-Distribution Problem in Explainability and Search Methods for Feature Importance Explanations | NeurIPS 2021 | sklearn | |
RPS LJE | Representer Point Selection via Local Jacobian Expansion for Post-hoc Classifier Explanation of Deep Neural Networks and Ensemble Models | NeurIPS 2021 | PyTorch | |
Model correction | Editing a Classifier by Rewriting Its Prediction Rules | NeurIPS 2021 | Code | |
suppressor variable litmus test | Scrutinizing XAI using linear ground-truth data with suppressor variables | Arxiv | | |
Explainable knowledge distillation | Learning Interpretation with Explainable Knowledge Distillation | Arxiv | | |
STEEX | STEEX: Steering Counterfactual Explanations with Semantics | Arxiv | Code | |
Binary counterfactual explanation | Counterfactual Explanations via Latent Space Projection and Interpolation | Arxiv | | |
ECLAIRE | Efficient Decompositional Rule Extraction for Deep Neural Networks | Arxiv | R | |
CartoonX | Cartoon Explanations of Image Classifiers | Researchgate | | |
concept based explanation | Explanations in terms of Hierarchically organised Middle Level Features | Paper | | see how close to MACE and PACE |
Concept ball | Ontology-based 𝑛-ball Concept Embeddings Informing Few-shot Image Classification | Paper | | |
SPARROW | SPARROW: Semantically Coherent Prototypes for Image Classification | BMVC 2021 | | |
XAI evaluation criteria | Objective criteria for explanations of machine learning models | Paper | | |
Code inversion with human perception | EXPLORING ALIGNMENT OF REPRESENTATIONS WITH HUMAN PERCEPTION | Arxiv | | |
Deformable ProtoPNet | Deformable ProtoPNet: An Interpretable Image Classifier Using Deformable Prototypes | Arxiv | | |
ICSN | Interactive Disentanglement: Learning Concepts by Interacting with their Prototype Representations | Arxiv | | |
HIVE | HIVE: Evaluating the Human Interpretability of Visual Explanations | Arxiv | Project Page | |
Jitter CAM | Jitter-CAM: Improving the Spatial Resolution of CAM-Based Explanations | BMVC 2021 | PyTorch | |
Interpreting last layer | dentifying Class Specific Filters with L1 Norm Frequency Histograms in Deep CNNs | Arxiv | | |
FCP | Forward Composition Propagation for Explainable Neural Reasoning | Arxiv | | |
Protopool | Interpretable Image Classification with Differentiable Prototypes Assignment | Arxiv | | |
PRELIM | Pedagogical Rule Extraction for Learning Interpretable Models | Arxiv | | |
Fair correction vectors | FAIR INTERPRETABLE LEARNING VIA CORRECTION VECTORS | ICLR 2021 | | |
Smooth LRP | SmoothLRP: Smoothing LRP by Averaging over Stochastic Input Variations | ESANN 2021 | | |
Causal CAM | EXTRACTING CAUSAL VISUAL FEATURES FOR LIMITED LABEL CLASSIFICATION | ICIP 2021 | | |