Awesome
Medical Masked Autoencoders
Paper
This repository provides the official implementation of training Vision Transformers (ViT) for (2D) medical imaging tasks as well as the usage of the pre-trained ViTs in the following paper:
<b>Delving into Masked Autoencoders for Multi-Label Thorax Disease Classification</b> <br/> Junfei Xiao, Yutong Bai, Alan Yuille, Zongwei Zhou <br/> Johns Hopkins University <br/> IEEE/CVF Winter Conference on Applications of Computer Vision (WACV), 2023 <br/> paper | code
TO DO
- Instructions for preparing datasets.
- Instructions for pretraining and fine-tuning.
Image reconstruction demo
<p align="center"><img src="figures/fig_reconstruction.png" width="100%"></p>Installing Requirements
Our codebase follows the MAE Official and uses some additional packages.
You may use one of the following commands to build environments with Conda
and Pip
.
Conda:
conda create -n medical_mae -f medical_mae.yml
Pip:
conda create -n medical_mae python=3.8
conda activate medical_mae
pip install -r requirements.txt
Preparing Datasets:
The MIMIC-CXR, CheXpert, and ChestX-ray14 datasets are public available on their official sites. You can download or request the access to them under the agreements.
You may also download them through the following links for research only and follow the official agreements.
MIMIC-CXR (JPG): https://physionet.org/content/mimic-cxr-jpg/2.0.0/
CheXpert (v1.0-small): https://www.kaggle.com/datasets/ashery/chexpert
ChestX-ray14 : https://www.kaggle.com/datasets/nih-chest-xrays/data
Pre-training on ImageNet or Chest X-rays
The pre-training instruction is in PRETRAIN.md.
Fine-tuning with pre-trained checkpoints
The fine-tuning instruction is in FINETUNE.md.
The following table provides the pre-trained checkpoints used in Table 1:
You can download all the weights in the following table with this link (google drive).
Model | Pretrained Dataset | Method | Pretrained | Finetuned (NIH Chest X-ray) | mAUC |
---|---|---|---|---|---|
DenseNet-121 | ImageNet | Categorization | torchvision official | google drive | 82.2 |
ResNet-50 | ImageNet | MoCo v2 | google drive | google drive | 80.9 |
ResNet-50 | ImageNet | BYOL | google drive | google drive | 81.0 |
ResNet-50 | ImageNet | SwAV | google drive | google drive | 81.5 |
DenseNet-121 | X-rays (0.3M) | MoCo v2 | google drive | google drive | 80.6 |
DenseNet 121 | X-rays (0.3M) | MAE | google drive | google drive | 81.2 |
ViT-Small/16 | ImageNet | Categorization | DeiT Official | google drive | 79.6 |
ViT-Small/16 | ImageNet | MAE | google drive | google drive | 78.6 |
ViT-Small/16 | X-rays (0.3M) | MAE | google drive | google drive | 82.3 |
ViT-Base/16 | X-rays (0.5M) | MAE | google drive | google drive | 83.0 |
Model | Pretrained Dataset | Finetuned (Chest X-ray) | mAUC | Finetuned (CheXpert) | mAUC | Finetuned (COVIDx) | Accuracy |
---|---|---|---|---|---|---|---|
ViT-Small/16 | X-rays (0.3M) | google drive | 82.3 | google drive | 89.2 | google drive | 95.2 |
ViT-Base/16 | X-rays (0.5M) | google drive | 83.0 | google drive | 89.3 | google drive | 95.3 |
Citation
If you use this code or use our pre-trained weights for your research, please cite our papers:
@inproceedings{xiao2023delving,
title={Delving into masked autoencoders for multi-label thorax disease classification},
author={Xiao, Junfei and Bai, Yutong and Yuille, Alan and Zhou, Zongwei},
booktitle={Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision},
pages={3588--3600},
year={2023}
}
License
This repo is under Apache 2.0 license.
Acknowledgement
This work was supported by the Lustgarten Foundation for Pancreatic Cancer Research.
Our code is built upon facebookresearch/mae.