Home

Awesome

Failure Detection in Medical Image Classification: A Reality Check and Benchmarking Testbed

Failure Detection Framework

This repository contains the code to reproduce all the experiments in the "Failure Detection in Medical Image Classification: A Reality Check and Benchmarking Testbed" paper. In particular it allows benchmarking of 9 different confidence scoring methods, for 6 different medical imaging datasets. We provide all the necessary training, evaluation and plotting code to reproduce all the experiments in the paper.

If you like this repository, please consider citing the associated paper

@article{
bernhardt2022failure,
title={Failure Detection in Medical Image Classification: A Reality Check and Benchmarking Testbed},
author={M{\'e}lanie Bernhardt and Fabio De Sousa Ribeiro and Ben Glocker},
journal={Transactions on Machine Learning Research},
year={2022},
url={https://openreview.net/forum?id=VBHuLfnOMf}
}

Overview

The repository is divided into 5 main folders:

All the outputs will be placed in {REPO_ROOT} / "outputs" by default.

Prerequisites

  1. Start by cloning our conda environment as specified by the environment.yml file as the root of the repository.
  2. Make sure you update the paths for the RSNA, the EyePACS and the BSUI datasets in default_paths.py to point to the root folder of the dataset as downloaded from Kaggle RSNA Pneumonia Detection Challenge for RNSA, from Kaggle EyePACS Challenge for EyePACS and from this page for BUSI. For the EyePACS dataset you will also need to download the test labels and place it at the root of the dataset repository.
  3. Make sure the root directory is in your PYTHONPATH environment variable. Ready to go!

Step-by-step example

In this section, we will walk you through all steps necessary to reproduce the experiments for OrganAMNIST and obtain the boxplots from the paper. The procedure is identical for all other experiments, you just need to change which yml config file you want to use (see next section for description of configs).

Assuming your current work directory is the root of the repository:

  1. Train the 5 classification models python classification/train.py --config medmnist/organamnist_resnet_dropout_all_layers.yml. Note if you just want to train one model you can specify a single seed in the config instead of 5 seeds.
  2. If you want to have the SWAG results, additionally train 5 SWAG models python classification/swag/train_swag.py --config medmnist/organamnist_resnet_dropout_all_layers.yml
  3. If you want to have the ConfidNet results also run python failure_detection/uncertainty_scorer/confidNet/train.py --config medmnist/organamnist_resnet_dropout_all_layers.yml
  4. If you want to have the DUQ results also run python classification/duq/train.py --config medmnist/organamnist_resnet_dropout_all_layers.yml.
  5. You are ready to run the evaluation benchmark with python failure_detection/run_evaluation.py --config medmnist/organamnist_resnet_dropout_all_layers.yml
  6. The outputs can be found in the outputs/{DATASET_NAME}/{MODEL_NAME}/{RUN_NAME} folder. Individual results for each seed are placed in separate subfolders named by the corresponding training seed. More importantly, in the failure_detection subfolder of this output folder you will find:
    • aggregated.csv the aggregated tables with average and standard deviation over seeds.
    • ensemble_metrics.csv with the metrics for the ensemble model.
  7. If you then want to reproduce the boxplots figures from the paper, you can run the paper_plotting_notebooks/aggregated_curves.ipynb notebook. Note this will save the plot inside the outputs/figures folder.

Configurations to reproduce all plots in the paper

In order to reproduce all plots in Figure 2 of the paper, you need to repeat all steps 1-7 for the following provided configs:

Just replace the config name by one of the above config names and follow the steps from the above section (note that config name argument is expected to be the relative path to the configs folder).

References

For this work, the following repositories were useful: