Home

Awesome

BenchX: A Unified Evaluation Framework for Medical Vision-Language Models on Chest X-Rays

Downstream Evaluation

0. Download Datasets

Change dataset paths in utils/constants.py accordingly.

1. Dataset Preparation

Please organize the datasets as the following structure:

root:[data]
+--CheXpert-v1.0-small
| +--train
| +--valid
| +--train.csv
| +--valid.csv
+--mimic_512
| +--files
| +--mimic-cxr-2.0.0-chexpert.csv
| +--mimic-cxr-2.0.0-metadata.csv
| +--mimic-cxr-2.0.0-negbio.csv
| +--mimic-cxr-2.0.0-split.csv
+--nih_chest_xray
| +--all_images
| +--test_list.txt
| +--train_val_list.txt
+--rsna_pneumonia
| +--stage_2_test_images
| +--stage_2_train_images
| +--stage_2_detailed_class_info.csv
| +--stage_2_sample_submission.csv
| +--stage_2_train_labels.csv
+--siim-acr-pneumothorax
| +--dicom-images-test
| +--dicom-images-train
| +--train-rle.csv

Note that we conduct our VQA experiments using the Rad-ReStruct benchmark repo. We follow their data preparation steps instead for the Rad-Restruct and VQA-RAD datasets.

2. Pre-processing

Run the following commands to pre-process the dataset(s) specified below:

python -m preprocessing.chexpert
python -m preprocessing.mimic_cxr #mimic_cxr_from_csv if preprocessing CSV file containing reports
python -m preprocessing.rsna_pneumonia
python -m preprocessing.siim_pneumothorax

No preprocessing is required for the NIH Chest X-ray dataset.

3. Zero-shot Evaluation & Fine-tuning

We evaluate our pre-trained models by specifying the --pretrain_path argument before running each downstream task. Arguments can be modified through configs/. Additional command-line arguments can also be specified to override the configuration setting.

To view all available models for evaluation, you may run the following script:

from evaluation import available_models
available_models()

Supported Tasks:

Zero-shot Classification

python -m evaluation.classification.zeroshot_classifier --config configs/zeroshot_retrieval_config.yaml

Finetuned Classification

python -m evaluation.classification.finetuned_classifier --config configs/finetuned_classification_config.yaml

Zero-shot Retrieval

python -m evaluation.retrieval.zeroshot_retrieval --config configs/zeroshot_retrieval_config.yaml

Finetuned Segmentation

python -m evaluation.segmentation.finetuned_segmentation --config configs/finetuned_segmentation_config.yaml

VQA

We conduct all experiments for medical VQA using the Rad-Restruct benchmark repo and provide the necessary files for reproducing the experiments here.