Home

Awesome

RadFM

The official code for the paper "Towards Generalist Foundation Model for Radiology by Leveraging Web-scale 2D&3D Medical Data"

ArXiv

Website

Model checkpoint

In this project, we collect a large-scale medical multi-modal dataset, MedMD, with 16M 2D or 3D images. We train a new medical multi-modal generative model RadFM on it, enabling both 2D and 3D scans, multi-image input and visual-language interleaving cases.

<img src="https://github.com/chaoyi-wu/RadFM/blob/main/Images/GIF.gif"/>

Latest News:

All Datasets are released! We have updated the links in our dataset table. You can find all our text part data in https://huggingface.co/datasets/chaoyi-wu/RadFM_data_csv.

For decompressing the splited compression files in most cases, please check the following code in linux:

cat zip.z* > myzip.zip
unzip myzip.zip

Quick Start:

For quick start, you can check the Quick_demo path.
We demonstrate a simple diagnosis case here to show how to inference with our model.
Feel free to modify it as you want.

By the way, never try to perform this in cpu and gpu is all you need :).

Pre-train:

For re-training a model on our dataset or large-scale testing our pre-train model, you can check src.

Simply, train.py for training and test.py for testing.

Case Study:

Some cases produced by our final model:

<img src="https://github.com/chaoyi-wu/RadFM/blob/main/Images/result_vqa.jpg"/> <img src="https://github.com/chaoyi-wu/RadFM/blob/main/Images/result_report.jpg"/> <img src="https://github.com/chaoyi-wu/RadFM/blob/main/Images/result_rationale.jpg"/>

Dataset-Links:

Datasets downloading URL:

Dataset NameLinkAccess
Rad3D-series-Closed
MPx-series-Closed
PMC-Figureshttps://pan.baidu.com/s/1Src_rhXsaOFp8zJ_3zMFsQ?pwd=p3neOpen Access
PMC-Inlinehttps://huggingface.co/datasets/chaoyi-wu/PMC-InlineOpen Access
PMC-CaseReportOriginal version, Filtered versionOpen Access
VinDr-Mammohttps://www.physionet.org/content/vindr-mammo/1.0.0/Credentialed Access
VinDr-SpineXRhttps://www.physionet.org/content/vindr-spinexr/1.0.0/Credentialed Access
VinDr-PCXRhttps://physionet.org/content/vindr-pcxr/1.0.0/Credentialed Access
PMC-OAhttps://huggingface.co/datasets/axiong/pmc_oa_betaOpen Access
PMC-VQAhttps://huggingface.co/datasets/xmcmic/PMC-VQAOpen Access
VQA-RADhttps://osf.io/89kps/Open Access
SLAKEhttps://www.med-vqa.com/slake/Open Access
MIMIC-CXRhttps://physionet.org/content/mimic-cxr/2.0.0Credentialed Access
VinDr-CXRhttps://physionet.org/content/vindr-cxr/1.0.0/Credentialed Access
NIH ChestXray14https://nihcc.app.box.com/v/ChestXray-NIHCC/folder/36938765345Open Access
CheXperthttps://aimi.stanford.edu/chexpert-chest-x-raysOpen Access
Covid-CXR2https://www.kaggle.com/datasets/andyczhao/covidx-cxr2Open Access
NLM-TBMontgomery, ChinaSetOpen Access
Object-CXRhttps://web.archive.org/web/20201127235812/https://jfhealthcare.github.io/object-CXR/Open Access
OpenIhttps://www.kaggle.com/datasets/raddar/chest-xrays-indiana-universityOpen Access
RSNAhttps://www.rsna.org/education/ai-resources-and-training/ai-image-challenge/rsna-pneumonia-detection-challenge-2018Open Access
SIIM-ACRhttps://www.kaggle.com/datasets/jesperdramsch/siim-acr-pneumothorax-segmentation-dataOpen Access

The split of each dataset can be found in https://huggingface.co/datasets/chaoyi-wu/RadFM_data_csv you just need to download the image part from each datasets.

Dataset Codes and Files Linking:

Check the following table to see how to process each dataset and how each file in https://huggingface.co/datasets/chaoyi-wu/RadFM_data_csv is linked to each dataset:

Dataset NameProcess Dataset CodeRelated Filename
Rad3D-seriesjpg2nii Process Code, nii2npy Process Code, Final Datset to Read npy and Related Textsradiology_article_npy_train/test.json
MPx-seriesMedPix DatasetMedPix_muli_train/test.csv, MedPix_single_train/test.csv
PMC-InlinePaper-inline Datasetpaper_train.csv (This dataset is not used for evaluation)
PMC-CaseReportCase-report Datasetfiltered_case_report_train/test.csv
VinDr-MammoDiagnosis Open Format Dataset, Diagnosis Close (yes/no) Format Datasetmammo_balance_train/test.csv
VinDr-SpineXRDiagnosis Open Format Dataset, Diagnosis Close (yes/no) Format Datasetspinexr_balance_train/test.csv
VinDr-PCXRDiagnosis Open Format Dataset, Diagnosis Close (yes/no) Format Datasetpcxr_balance_train/test.csv
PMC-OAPmcoa Datasetpmcoa_image_caption_train/test.csv
PMC-VQAvqa Datasetpmcvaq_train/test.csv
VQA-RADvqa Datasetvqarad_train/test.csv
SLAKEvqa Datasetslakevqa_train/test.csv
MIMIC-CXRCXR Open Captioning Datasetmimic_caption_train/test.csv
VinDr-CXRDiagnosis Open Format Dataset, Diagnosis Close (yes/no) Format Datasetchestxray_balance_train_new.csv, chestxray_balance_test.csv
NIH ChestXray14Diagnosis Open Format Dataset, Diagnosis Close (yes/no) Format Datasetchestxray_balance_train_new.csv, chestxray_balance_test.csv
CheXpertDiagnosis Open Format Dataset, Diagnosis Close (yes/no) Format Datasetchestxray_balance_train_new.csv, chestxray_balance_test.csv
Covid-CXR2Diagnosis Open Format Dataset, Diagnosis Close (yes/no) Format Datasetchestxray_balance_train_new.csv, chestxray_balance_test.csv
NLM-TBDiagnosis Open Format Dataset, Diagnosis Close (yes/no) Format Datasetchestxray_balance_train_new.csv, chestxray_balance_test.csv
Object-CXRDiagnosis Open Format Dataset, Diagnosis Close (yes/no) Format Datasetchestxray_balance_train_new.csv, chestxray_balance_test.csv
OpenIDiagnosis Open Format Dataset, Diagnosis Close (yes/no) Format Datasetchestxray_balance_train_new.csv, chestxray_balance_test.csv
RSNADiagnosis Open Format Dataset, Diagnosis Close (yes/no) Format Datasetchestxray_balance_train_new.csv, chestxray_balance_test.csv
SIIM-ACRDiagnosis Open Format Dataset, Diagnosis Close (yes/no) Format Datasetchestxray_balance_train_new.csv, chestxray_balance_test.csv

Acknowledgment:

We sincerely thank all the contributors who uploaded the relevant data in our dataset online. We appreciate their willingness to make these valuable cases publicly available.

Contact

If you have any questions, please feel free to contact wtzxxxwcy02@sjtu.edu.cn.