Home

Awesome

Awesome Vision-Language Models (VLMs) for Medical Report Generation (RG) and Visual Question Answering (VQA)

Vision-Language Models for Medical Report Generation and Visual Question Answering: A Review is the comprehensive review that includes:

The list of medical VLMs

Medical VLMVQARGPaperCodeYear
MedViLL++Moon et al.GitHub2021
PubMedCLIP+-Eslami et al.GitHub2021
RepsNet++Tanwani et al.on request at Site ?2022
BiomedCLIP+-Zhang et al.Hugging Face2023
UniXGen-+Lee et al.GitHub2023
RAMM+-Yuan et al.GitHub2023
X-REM-+Jeong et al.GitHub2023
Visual Med-Alpaca+--GitHub2023
CXR-RePaiR-Gen-+Ranjit et al.-2023
LLaVa-Med+-Li et al.GitHub2023
XrayGPT++Thawkar et al.GitHub2023
CAT-ViL DeiT+-Bai et al.GitHub2023
MUMC+-Li et al.GitHub2023
Med-Flamingo+-Moor et al.GitHub2023
RaDialog++Pellegrini et al.GitHub2023
PathChat+-Lu et al.GitHub2024

The list of Medical Vision-Language Datasets

Medical DatasetImage-Text pairsQA pairsPaperLink
ROCO+-Pelka et al.GitHub
MIMIC-CXR+-Johnson et al.PhysioNet
MIMIC-CXR-JPG+-Johnson et al.PhysioNet
MIMIC-NLE+-Kayser et al.GitHub
CXR-PRO+ (unpaired)-Ramesh et al.PhysioNet
MS-CXR+-Boecking et al.PhysioNet
IU-Xray or Open-I+-Demner-Fushman et al.Openi
MedICaT+-Subramanian et al.GitHub
PMC-OA+-Lin et al.Hugging Face
SLAKE-+Liu et al.MedVQA
VQA-RAD-+Lau et al.Osf
PathVQA-+He et al.GitHub
VQA-Med 2019-+Abacha et al.GitHub
VQA-Med 2020-+Abacha et al.GitHub
VQA-Med 2021-+Ionescu et al.GitHub
EndoVis 2017-+Allan et al.GitHub
EndoVis 2018-+Allan et al.image frames in Challenge and the rest on GitHub
PathQABench-Public-+Lu et al.GitHub

Citation

@article{Hartsock2024,
  title={Vision-language models for medical report generation and visual question answering: a review},
  author={Hartsock, Iryna and Rasool, Ghulam},
  journal={Frontiers in Artificial Intelligence},
  volume={7},
  pages={1430984},
  year={2024},
  publisher={Frontiers Media SA},
  doi={10.3389/frai.2024.1430984},
  url={https://www.frontiersin.org/articles/10.3389/frai.2024.1430984/full}
}