Home

Awesome

Failing Conceptually: Concept-Based Explanations of Dataset Shift

License: MIT Python 3.6

Description

Despite their remarkable performance on a wide range of visual tasks, machine learning technologies often succumb to data distribution shifts. Consequently, a range of recent work explores techniques for detecting these shifts. Unfortunately, current techniques offer no explanations about what triggers the detection of shifts, thus limiting their utility to provide actionable insights. In this work, we present Concept Bottleneck Shift Detection (CBSD): a novel explainable shift detection method. CBSD provides explanations by identifying and ranking the degree to which high-level human-understandable concepts are affected by shifts. Using two case studies (dSprites and 3dshapes), we demonstrate how CBSD can accurately detect underlying concepts that are affected by shifts and achieve higher detection accuracy compared to state-of-the-art shift detection methods.

This repository contains source code of the system and experimentation results.

Note:

Pipeline

<img src="pipeline.jpg" alt="pipeline" width="750"/>

Our shift detection pipeline comprises four step:

<ol type="A"> <li>The source and target data are fed to a dimensionality reduction process.</li> <li>The reduced representations are analysed using two-sample hypothesis testing, producing <i>p</i>-value and test statistics.</li> <li>The resulting <i>p</i>-value and test statistics are used to determine whether a shift exist. We determine a shift exists when there are statistically significant difference in distribution between source and target data. </li> <li>CBSD provides explanations, identifying and ranking the degree to which each human-understandable concepts were affected by shifts.</li> </ol>

Requirements:

Folder Structure:

Setup:

git clone https://github.com/maleakhiw/explaining-dataset-shifts.git
cd explaining-dataset-shifts
pip install -r requirements.txt

Getting Started

Source code to apply shifts, build dimensionality reductors, conduct statistical tests, and experimentation utilities are located inside scripts. Using the source code, we ran experimentation using the notebooks (data-collection) that is located inside experiments. To replicate our experiments, please install the requirements and run the notebooks. Alternatively, you can create a new script yourself and import all source codes.

All experimentation data have been pickled and stored inside results. All pretrained models, including the concept bottleneck models, end-to-end neural networks, trained, and untrained autoencoders are stored inside models. If you wish to visualise all experimentation results, the easiest way is to run the notebooks (results), where we exported the pickled results and display various tables and plots.

Authors:

Citing:

@article{DBLP:journals/corr/abs-2104-08952,
  author    = {Maleakhi A. Wijaya and
               Dmitry Kazhdan and
               Botty Dimanov and
               Mateja Jamnik},
  title     = {Failing Conceptually: Concept-Based Explanations of Dataset Shift},
  journal   = {CoRR},
  volume    = {abs/2104.08952},
  year      = {2021},
  url       = {https://arxiv.org/abs/2104.08952},
  archivePrefix = {arXiv},
  eprint    = {2104.08952},
  timestamp = {Mon, 26 Apr 2021 17:25:10 +0200},
  biburl    = {https://dblp.org/rec/journals/corr/abs-2104-08952.bib},
  bibsource = {dblp computer science bibliography, https://dblp.org}
}