Awesome
BenchX: A Unified Evaluation Framework for Medical Vision-Language Models on Chest X-Rays
Downstream Evaluation
0. Download Datasets
-
MIMIC-CXR: We downloaded the MIMIC-CXR-JPG dataset for pre-training using paired medical reports and images.
-
CheXpert: We downloaded the CheXpert-v1.0-small dataset from Kaggle.
-
RSNA Pneumonia: We used the stage 2 data of the RSNA Pneumonia dataset from Kaggle.
-
SIIM: We downloaded the stage 1 data of the SIIM-ACR Pneumothorax Segmentation dataset from Kaggle.
-
NIH Chest X-rays: We downloaded Version 3 of the NIH Chest X-rays dataset from Kaggle. All images from
img_0XX
folders are moved to a combined sub-folderall_images/
. -
Rad-Restruct: We downloaded the Rad-ReStruct medical VQA benchmark dataset through the official download link in the repo.
-
VQA-RAD: We downloaded the VQA-RAD dataset through its official channel.
Change dataset paths in utils/constants.py
accordingly.
1. Dataset Preparation
Please organize the datasets as the following structure:
root:[data]
+--CheXpert-v1.0-small
| +--train
| +--valid
| +--train.csv
| +--valid.csv
+--mimic_512
| +--files
| +--mimic-cxr-2.0.0-chexpert.csv
| +--mimic-cxr-2.0.0-metadata.csv
| +--mimic-cxr-2.0.0-negbio.csv
| +--mimic-cxr-2.0.0-split.csv
+--nih_chest_xray
| +--all_images
| +--test_list.txt
| +--train_val_list.txt
+--rsna_pneumonia
| +--stage_2_test_images
| +--stage_2_train_images
| +--stage_2_detailed_class_info.csv
| +--stage_2_sample_submission.csv
| +--stage_2_train_labels.csv
+--siim-acr-pneumothorax
| +--dicom-images-test
| +--dicom-images-train
| +--train-rle.csv
Note that we conduct our VQA experiments using the Rad-ReStruct benchmark repo. We follow their data preparation steps instead for the Rad-Restruct and VQA-RAD datasets.
2. Pre-processing
Run the following commands to pre-process the dataset(s) specified below:
python -m preprocessing.chexpert
python -m preprocessing.mimic_cxr #mimic_cxr_from_csv if preprocessing CSV file containing reports
python -m preprocessing.rsna_pneumonia
python -m preprocessing.siim_pneumothorax
No preprocessing is required for the NIH Chest X-ray dataset.
3. Zero-shot Evaluation & Fine-tuning
We evaluate our pre-trained models by specifying the --pretrain_path
argument before running each downstream task. Arguments can be modified through configs/
. Additional command-line arguments can also be specified to override the configuration setting.
To view all available models for evaluation, you may run the following script:
from evaluation import available_models
available_models()
Supported Tasks:
- Uni-modal Tasks
- Multi-label Classification on CheXpert (Fine-tuned)
- Binary Classification on RSNA Pneumonia (Fine-tuned)
- Semantic Segmentation on SIIM-ACR Pneumothorax (Fine-tuned)
- Cross-modal Tasks
- Cross-modal Retrieval on CheXpert-5x200/MIMIC-5x200 (Zero-shot)
- Cross-modal Classification on CheXpert-5x200 (Zero-shot)
- Cross-modal Classification on RSNA Pneumonia (Zero-shot)
- Multi-modal Tasks
- Visual Question Answering on Rad-Restruct
- Visual Question Answering on VQA-RAD
Zero-shot Classification
python -m evaluation.classification.zeroshot_classifier --config configs/zeroshot_retrieval_config.yaml
Finetuned Classification
python -m evaluation.classification.finetuned_classifier --config configs/finetuned_classification_config.yaml
Zero-shot Retrieval
python -m evaluation.retrieval.zeroshot_retrieval --config configs/zeroshot_retrieval_config.yaml
Finetuned Segmentation
python -m evaluation.segmentation.finetuned_segmentation --config configs/finetuned_segmentation_config.yaml
VQA
We conduct all experiments for medical VQA using the Rad-Restruct benchmark repo and provide the necessary files for reproducing the experiments here.