Home

Awesome

Awesome Vision-Language Models (VLMs) for Medical Report Generation (RG) and Visual Question Answering (VQA)

Vision-Language Models for Medical Report Generation and Visual Question Answering: A Review is the comprehensive review that includes:

The list of medical VLMs

Medical VLMVQARGPaperCodeYear
MedViLL++Moon et al.GitHub2021
PubMedCLIP+-Eslami et al.GitHub2021
RepsNet++Tanwani et al.on request at Site ?2022
BiomedCLIP+-Zhang et al.Hugging Face2023
UniXGen-+Lee et al.GitHub2023
RAMM+-Yuan et al.GitHub2023
X-REM-+Jeong et al.GitHub2023
Visual Med-Alpaca+--GitHub2023
CXR-RePaiR-Gen-+Ranjit et al.-2023
LLaVa-Med+-Li et al.GitHub2023
XrayGPT++Thawkar et al.GitHub2023
CAT-ViL DeiT+-Bai et al.GitHub2023
MUMC+-Li et al.GitHub2023
Med-Flamingo+-Moor et al.GitHub2023
RaDialog++Pellegrini et al.GitHub2023
PathChat+-Lu et al.GitHub2024

The list of Medical Vision-Language Datasets

Medical DatasetImage-Text pairsQA pairsPaperLink
ROCO+-Pelka et al.GitHub
MIMIC-CXR+-Johnson et al.PhysioNet
MIMIC-CXR-JPG+-Johnson et al.PhysioNet
MIMIC-NLE+-Kayser et al.GitHub
CXR-PRO+ (unpaired)-Ramesh et al.PhysioNet
MS-CXR+-Boecking et al.PhysioNet
IU-Xray or Open-I+-Demner-Fushman et al.Openi
MedICaT+-Subramanian et al.GitHub
PMC-OA+-Lin et al.Hugging Face
SLAKE-+Liu et al.MedVQA
VQA-RAD-+Lau et al.Osf
PathVQA-+He et al.GitHub
VQA-Med 2019-+Abacha et al.GitHub
VQA-Med 2020-+Abacha et al.GitHub
VQA-Med 2021-+Ionescu et al.GitHub
EndoVis 2017-+Allan et al.GitHub
EndoVis 2018-+Allan et al.image frames in Challenge and the rest on GitHub
PathQABench-Public-+Lu et al.GitHub

Citation

@misc{hartsock2024visionlanguage,
      title={Vision-Language Models for Medical Report Generation and Visual Question Answering: A Review}, 
      author={Iryna Hartsock and Ghulam Rasool},
      year={2024},
      eprint={2403.02469},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}