Awesome

Awesome Vision-Language Models (VLMs) for Medical Report Generation (RG) and Visual Question Answering (VQA)

Vision-Language Models for Medical Report Generation and Visual Question Answering: A Review is the comprehensive review that includes:

the latest publicly available VLMs specifically designed for medical RG and VQA;
the essential background on computer vision, natural language processing, and VLMs to ensure its accessibility for readers without a machine learning background;
the description of publicly available vision-language datasets, encompassing medical image-text pairs or question-answer pairs related to medical images;
the detailed description of metrics employed for evaluating VLMs on RG and VQA tasks;
the discussion of current challenges in the field and various potential research directions that could significantly shape the future of medical VLMs.

The list of medical VLMs

Medical VLM	VQA	RG	Paper	Code	Year
MedViLL	+	+	Moon et al.	GitHub	2021
PubMedCLIP	+	-	Eslami et al.	GitHub	2021
RepsNet	+	+	Tanwani et al.	on request at Site ?	2022
BiomedCLIP	+	-	Zhang et al.	Hugging Face	2023
UniXGen	-	+	Lee et al.	GitHub	2023
RAMM	+	-	Yuan et al.	GitHub	2023
X-REM	-	+	Jeong et al.	GitHub	2023
Visual Med-Alpaca	+	-	-	GitHub	2023
CXR-RePaiR-Gen	-	+	Ranjit et al.	-	2023
LLaVa-Med	+	-	Li et al.	GitHub	2023
XrayGPT	+	+	Thawkar et al.	GitHub	2023
CAT-ViL DeiT	+	-	Bai et al.	GitHub	2023
MUMC	+	-	Li et al.	GitHub	2023
Med-Flamingo	+	-	Moor et al.	GitHub	2023
RaDialog	+	+	Pellegrini et al.	GitHub	2023
PathChat	+	-	Lu et al.	GitHub	2024

The list of Medical Vision-Language Datasets

Medical Dataset	Image-Text pairs	QA pairs	Paper	Link
ROCO	+	-	Pelka et al.	GitHub
MIMIC-CXR	+	-	Johnson et al.	PhysioNet
MIMIC-CXR-JPG	+	-	Johnson et al.	PhysioNet
MIMIC-NLE	+	-	Kayser et al.	GitHub
CXR-PRO	+ (unpaired)	-	Ramesh et al.	PhysioNet
MS-CXR	+	-	Boecking et al.	PhysioNet
IU-Xray or Open-I	+	-	Demner-Fushman et al.	Openi
MedICaT	+	-	Subramanian et al.	GitHub
PMC-OA	+	-	Lin et al.	Hugging Face
SLAKE	-	+	Liu et al.	MedVQA
VQA-RAD	-	+	Lau et al.	Osf
PathVQA	-	+	He et al.	GitHub
VQA-Med 2019	-	+	Abacha et al.	GitHub
VQA-Med 2020	-	+	Abacha et al.	GitHub
VQA-Med 2021	-	+	Ionescu et al.	GitHub
EndoVis 2017	-	+	Allan et al.	GitHub
EndoVis 2018	-	+	Allan et al.	image frames in Challenge and the rest on GitHub
PathQABench-Public	-	+	Lu et al.	GitHub

Citation

@article{Hartsock2024,
  title={Vision-language models for medical report generation and visual question answering: a review},
  author={Hartsock, Iryna and Rasool, Ghulam},
  journal={Frontiers in Artificial Intelligence},
  volume={7},
  pages={1430984},
  year={2024},
  publisher={Frontiers Media SA},
  doi={10.3389/frai.2024.1430984},
  url={https://www.frontiersin.org/articles/10.3389/frai.2024.1430984/full}
}