Awesome
Awesome-Multimodal-Applications-In-Medical-Imaging
This repository includes resources on several applications of multi-modal learning in medical imaging, including papers related to <b>large language models (LLM)</b>. Papers involving LLM are bold.
Contributing
Please feel free to send me pull requests or email to add links or to discuss with me about this area. Markdown format:
- [**Name of Conference or Journal + Year**] Paper Name. [[pdf]](link) [[code]](link)
News
- [2024-10] :fire::fire:We release a new paper on using versatile multimodal RAG system for Med-VLMs: "MMed-RAG: Versatile Multimodal RAG System for Medical Vision Language Models".
- [2024-09] šš CARES was accepted at NeurIPS'24, RULE was accepted at EMNLP'24 main conference!
- [2024-07] :fire::fire:We release a new paper on enhance the factuality of Med-VLMs with RAG: "RULE: Reliable Multimodal RAG for Factuality in Medical Vision Language Models".
- [2024-06] :fire::fire:We release a new paper on evaluating Med-VLMs: "CARES: A Comprehensive Benchmark of Trustworthiness in Medical Vision Language Models".
- [2022-07] We create this repository to maintain a paper list on multimodal applications in medical imaging.
Citation
@article{xia2024cares,
title={CARES: A Comprehensive Benchmark of Trustworthiness in Medical Vision Language Models},
author={Xia, Peng and Chen, Ze and Tian, Juanxi and Gong, Yangrui and Hou, Ruibo and Xu, Yue and Wu, Zhenbang and Fan, Zhiyuan and Zhou, Yiyang and Zhu, Kangyu and others},
journal={arXiv preprint arXiv:2406.06007},
year={2024}
}
@inproceedings{xia2024rule,
title={RULE: Reliable Multimodal RAG for Factuality in Medical Vision Language Models},
author={Xia, Peng and Zhu, Kangyu and Li, Haoran and Zhu, Hongtu and Li, Yun and Li, Gang and Zhang, Linjun and Yao, Huaxiu},
booktitle={Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing},
pages={1081--1093},
year={2024}
}
@article{xia2024mmed,
title={MMed-RAG: Versatile Multimodal RAG System for Medical Vision Language Models},
author={Xia, Peng and Zhu, Kangyu and Li, Haoran and Wang, Tianze and Shi, Weijia and Wang, Sheng and Zhang, Linjun and Zou, James and Yao, Huaxiu},
journal={arXiv preprint arXiv:2410.13085},
year={2024}
}
Overview
- Data Source
- Survey
- Medical Report Generation
- Medical Visual Question Answering
- Medical Vision-Language Model
Data Source
Image-Caption Datasets
dataset | domain | image | text | source | language |
---|---|---|---|---|---|
ROCO | multiple | 87K | 87K | research papers | En |
MedICaT | multiple | 217K | 217K | research papers | En |
PMC-OA | multiple | 1.6M | 1.6M | research papers | En |
ChiMed-VL | multiple | 580K | 580K | research papers | En/zh |
FFA-IR | fundus | 1M | 10K | medical reports | En/zh |
PadChest | cxr | 160K | 109K | medical reports | Sp |
MIMIC-CXR | cxr | 377K | 227K | medical reports | En |
OpenPath | histology | 208K | 208K | social media | En |
Quilt-1M | histology | 1M | 1M | research papers<br>social media | En |
Harvard-FairVLMed | fundus | 10k | 10K | medical reports | En |
MedTrinity-25M | multiple | 25M | 25M | research papers<br>social media | En |
Visual Question Answering Datasets
dataset | domain | image | QA Items | language |
---|---|---|---|---|
VQA-RAD | radiology | 315 | 3k | En |
SLAKE | radiology | 642 | 14k | En/zh |
Path-VQA | histology | 5k | 32M | En |
VQA-Med | radiology | 4.5k | 5.5k | En |
PMC-VQA | multiple | 149k | 227k | En |
OmniMedVQA | multiple | 118k | 128k | En |
ProbMed | radiology | 6k | 57k | En |
PubMedVision | multiple | 914k | 1.3M | En |
Survey
- [arXiv 2022] Visual Attention Methods in Deep Learning: An In-Depth Survey [pdf]
- [arXiv 2022] Vision+X: A Survey on Multimodal Learning in the Light of Data [pdf]
- [arXiv 2023] Vision Language Models for Vision Tasks: A Survey [pdf] [code]
- [arXiv 2023] A Systematic Review of Deep Learning-based Research on Radiology Report Generation [pdf] [code]
- [Artif Intell Med 2023] Medical Visual Question Answering: A Survey [pdf]
- [arXiv 2023] Medical Vision Language Pretraining: A survey [pdf]
- [arXiv 2023] CLIP in Medical Imaging: A Comprehensive Survey [pdf] [code]
- [arXiv 2024] Vision-Language Models for Medical Report Generation and Visual Question Answering: A Review [pdf] [code]
Medical Report Generation
2018
- [EMNLP 2018] Automated Generation of Accurate & Fluent Medical X-ray Reports [pdf] [code]
- [ACL 2018] On the Automatic Generation of Medical Imaging Reports [pdf] [code]
- [NeurIPS 2018] Hybrid Retrieval-Generation Reinforced Agent for Medical Image Report Generation [pdf]
2019
- [AAAI 2019] Knowledge-Driven Encode, Retrieve, Paraphrase for Medical Image Report Generation [pdf]
- [ICDM 2019] Automatic Generation of Medical Imaging Diagnostic Report with Hierarchical Recurrent Neural Network [pdf]
- [MICCAI 2019] Automatic Radiology Report Generation based on Multi-view Image Fusion and Medical Concept Enrichment [pdf]
2020
- [AAAI 2020] When Radiology Report Generation Meets Knowledge Graph [pdf]
- [EMNLP 2020] Generating Radiology Reports via Memory-driven Transformer [pdf] [code]
- [ACCV 2020] Hierarchical X-Ray Report Generation via Pathology tags and Multi Head Attention [pdf] [code]
2021
- [NeurIPS 2021] FFA-IR: Towards an Explainable and Reliable Medical Report Generation Benchmark [pdf] [code]
- [ACL 2021] Competence-based Multimodal Curriculum Learning for Medical Report Generation [pdf]
- [CVPR 2021] Exploring and Distilling Posterior and Prior Knowledge for Radiology Report Generation [pdf]
- [MICCAI 2021] AlignTransformer: Hierarchical Alignment of Visual Regions and Disease Tags for Medical Report Generation [pdf]
- [NAACL 2021] Improving Factual Completeness and Consistency of Image-to-Text Radiology Report Generation [pdf] [code]
- [MICCAI 2021] RATCHET: Medical Transformer for Chest X-ray Diagnosis and Reporting [pdf][code]
- [MICCAI 2021] Trust It or Not: Confidence-Guided Automatic Radiology Report Generation [pdf]
- [MICCAI 2021] Surgical Instruction Generation with Transformers [pdf]
- [MICCAI 2021] Class-Incremental Domain Adaptation with Smoothing and Calibration for Surgical Report Generation [pdf] [code]
- [ACL 2021] Cross-modal Memory Networks for Radiology Report Generation [pdf] [code]
2022
- [CVPR 2022] Cross-modal Clinical Graph Transformer for Ophthalmic Report Generation [pdf]
- [Nature Machine Intelligence 2022] Generalized Radiograph Representation Learning via Cross-supervision between Images and Free-text Radiology Reports [pdf] [code]
- [MICCAI 2022] A Self-Guided Framework for Radiology Report Generation [pdf]
- [MICCAI 2022] A Medical Semantic-Assisted Transformer for Radiographic Report Generation [pdf]
- [MIDL 2022] Representative Image Feature Extraction via Contrastive Learning Pretraining for Chest X-ray Report Generation [pdf]
- [MICCAI 2022] RepsNet: Combining Vision with Language for Automated Medical Reports [pdf] [code]
- [ICML 2022] Improving Radiology Report Generation Systems by Removing Hallucinated References to Non-existent Priors [pdf]
- [TNNLS 2022] Hybrid Reinforced Medical Report Generation with M-Linear Attention and Repetition Penalty [pdf]
- [MedIA 2022] CAMANet: Class Activation Map Guided Attention Network for Radiology Report Generation [pdf]
- [MedIA 2022] Knowledge matters: Chest radiology report generation with general and specific knowledge [pdf] [code]
- [MICCAI 2022] Lesion Guided Explainable Few Weak-shot Medical Report Generation [pdf] [code]
- [BMVC 2022] On the Importance of Image Encoding in Automated Chest X-Ray Report Generation [pdf] [code]
- [arXiv 2022] RoentGen: Vision-Language Foundation Model for Chest X-ray Generation [pdf]
- [COLING 2022] DeltaNet:Conditional Medical Report Generation for COVID-19 Diagnosis [pdf] [code]
- [ECCV 2022] Cross-modal Prototype Driven Network for Radiology Report Generation [pdf] [code]
2023
- [ICIP 2023] Self adaptive global-local feature enhancement for radiology report generation [pdf]
- [TMI 2023] Attributed Abnormality Graph Embedding for Clinically Accurate X-Ray Report Generation [pdf]
- [arXiv 2023] Unified Chest X-ray and Radiology Report Generation Model with Multi-view Chest X-rays [pdf] [code]
- [WWW 2023] Auxiliary signal-guided knowledge encoder-decoder for medical report generation [pdf]
- [CVPR 2023] Dynamic Graph Enhanced Contrastive Learning for Chest X-ray Report Generation [pdf] [code]
- [CVPR 2023] KiUT: Knowledge-Injected U-Transformer for Radiology Report Generation [pdf]
- [CVPR 2023] Interactive and Explainable Region-guided Radiology Report Generation [pdf] [code]
- [MIDL 2023] Multimodal Image-Text Matching Improves Retrieval-based Chest X-Ray Report Generation [pdf] [code]
- [arXiv 2023] Visual-Linguistic Causal Intervention for Radiology Report Generation [pdf] [code]
- [MIDL 2023] Vision-Language Modelling For Radiological Imaging and Reports In The Low Data Regime [pdf]
- [arXiv 2023] Cross-Modal Causal Intervention for Medical Report Generation [pdf] [code]
- [ICASSP 2023] MvCo-DoT:Multi-View Contrastive Domain Transfer Network for Medical Report Generation [pdf]
- [CHIL 2023] Token Imbalance Adaptation for Radiology Report Generation [pdf] [code]
- [AAAI 2023] "Nothing Abnormal": Disambiguating Medical Reports via Contrastive Knowledge Infusion [pdf] [code]
- [arXiv 2023] S4M: Generating Radiology Reports by A Single Model for Multiple Body Parts [pdf] [code]
- [CVPR 2023] KiUT: Knowledge-injected U-Transformer for Radiology Report Generation [pdf]
- [ACL 2023] Replace and Report: NLP Assisted Radiology Report Generation [pdf]
- [ICCV 2023] PRIOR: Prototype Representation Joint Learning from Medical Images and Reports [pdf] [code]
- [ICMLW 2023] Rethinking Medical Report Generation: Disease Revealing Enhancement with Knowledge Graph [pdf] [code]
- [MICCAI 2023] Rad-ReStruct: A Novel VQA Benchmark and Method for Structured Radiology Reporting [pdf] [code]
- [MLMIW 2023] Finding-Aware Anatomical Tokens for Chest X-Ray Automated Reporting [pdf]
- [MedIA 2023] C^2M-DoT: Cross-modal consistent multi-view medical report generation with domain transfer network [pdf]
- [EMNLP 2023 Findings] Controllable Chest X-Ray Report Generation from Longitudinal Representations [pdf]
- [BIBM 2023] Enhanced Knowledge Injection for Radiology Report Generation [pdf]
- [EMNLP 2023 Findings] Style-Aware Radiology Report Generation with RadGraph and Few-Shot Prompting [pdf]
- [ACL 2023] ORGAN: Observation-Guided Radiology Report Generation via Tree-Reasoning [pdf] [code]
- [EMNLP 2023 Findings] RECAP: Towards Precise Radiology Report Generation via Dynamic Disease Progression Reasoning [pdf] [code]
- [NeurIPSW 2023] Effectively Fine-tune to Improve Large Multimodal Models for Radiology Report Generation [pdf]
- [arXiv 2023] Radiology-Aware Model-Based Evaluation Metric for Report Generation [pdf]
- [EMNLP 2023] PhenotypeCLIP: Phenotype-based Contrastive Learning for Medical Imaging Report Generation [pdf]
- [arXiv 2023] Fine-Grained Image-Text Alignment in Medical Imaging Enables Cyclic Image-Report Generation [pdf]
- [arXiv 2023] Improving Medical Report Generation with Adapter Tuning and Knowledge Enhancement in Vision-Language Foundation Models [pdf]
- [NLPCC 2023] Medical Report Generation based on Segment-Enhanced Contrastive Representation Learning [pdf]
- [MICCAI 2023] SGT: Scene Graph-Guided Transformer for Surgical Report Generation [pdf] [code]
2024
- [ICASSP 2024] Sam-Guided Enhanced Fine-Grained Encoding with Mixed Semantic Learning for Medical Image Captioning [pdf] [code]
- [AAAI 2024] PromptMRG: Diagnosis-Driven Prompts for Medical Report Generation [pdf] [code]
- [WACV 2024] Complex Organ Mask Guided Radiology Report Generation [pdf] [code]
- [TMM 2024] From Observation to Concept: A Flexible Multi-view Paradigm for Medical Report Generation [pdf]
- [TMI 2024] SGT++: Improved Scene Graph-guided Transformer for Surgical Report Generation [pdf]
- [arXiv 2024] Unmasking and Quantifying Racial Bias of Large Language Models in Medical Report Generation [pdf]
- [arXiv 2024] Dual-modal Dynamic Traceback Learning for Medical Report Generation [pdf]
- [arXiv 2024] MedCycle: Unpaired Medical Report Generation via Cycle-Consistency [pdf]
- [arXiv 2024] Scene Graph Aided Radiology Report Generation [pdf]
- [ACL 2024 Findings] Extracting and Encoding: Leveraging Large Language Models and Medical Knowledge to Enhance Radiological Text Representation [pdf] [code]
- [arXiv 2024] TRRG: Towards Truthful Radiology Report Generation With Cross-modal Disease Clue Enhanced Large Language Models [pdf]
Medical Visual Question Answering
2020
2021
- [arXiv 2021] MuVAM: A Multi-View Attention-based Model for Medical Visual Question Answering [pdf]
- [Scientific Reports 2021] MedFuseNet: An attention-based multimodal deep learning model for visual question answering in the medical domain [pdf]
2022
- [MICCAI 2022] Consistency-preserving Visual Question Answering in Medical Imaging [pdf] [code]
- [MICCAI 2022] Surgical-VQA: Visual Question Answering in Surgical Scenes using Transformer [pdf] [code]
- [ECCV 2022] Distilled Dual-Encoder Model for Vision-Language Understanding [pdf] [code]
- [arXiv 2022] UnICLAM:Contrastive Representation Learning with Adversarial Masking for Unified and Interpretable Medical Vision Question Answering [pdf]
2023
- [TMI 2023] A Dual-Attention Learning Network with Word and Sentence Embedding for Medical Visual Question Answering [pdf] [code]
- [ISBI 2023] MF2-MVQA: A Multi-stage Feature Fusion method for Medical Visual Question Answering [pdf]
- [ISBI 2023] Self-supervised vision-language pretraining for Medical visual question answering [pdf] [code]
- [arXiv 2023] Interpretable Medical Image Visual Question Answering via Multi-Modal Relationship Graph Learning [pdf]
- [MM 2023] RAMM: Retrieval-augmented Biomedical Visual Question Answering with Multi-modal Pre-training [pdf] [code]
- [IPMI 2023] Q2ATransformer: Improving Medical VQA via an Answer Querying Decoder [pdf]
- [MICCAI 2023] Open-Ended Medical Visual Question Answering Through Prefix Tuning of Language Models [pdf] [code]
- [arXiv 2023] PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering [pdf] [code]
- [MICCAI 2023] Masked Vision and Language Pre-training with Unimodal and Multimodal Contrastive Losses for Medical Visual Question Answering [pdf] [code]
- [MICCAI 2023] Localized Questions in Medical Visual Question Answering [pdf] [code]
- [arXiv 2023] Multimodal Prompt Retrieval for Generative Visual Question Answering [pdf] [code]
- [KDD 2023] Expert Knowledge-Aware Image Difference Graph Representation Learning for Difference-Aware Medical Visual Question Answering [pdf] [code]
- [NeurIPS 2023 D&B] EHRXQA: A Multi-Modal Question Answering Dataset for Electronic Health Records with Chest X-ray Images [pdf] [code]
- [MICCAI 2023] Rad-ReStruct: A Novel VQA Benchmark and Method for Structured Radiology Reporting [pdf] [code]
- [arXiv 2023] BESTMVQA: A Benchmark Evaluation System for Medical Visual Question Answering [pdf] [demo]
- [NeurIPS 2023] Quilt-1m: One million image-text pairs for histopathology [pdf] [code-demo]
2024
- [arXiv 2024] MedPromptX: Grounded Multimodal Prompting for Chest X-ray Diagnosis [pdf] [code]
- [arXiv 2024] PeFoMed: Parameter Efficient Fine-tuning on Multimodal Large Language Models for Medical Visual Question Answering [pdf] [code]
- [ICASSP 2024] Prompt-based Personalized Federated Learning for Medical Visual Question Answering [pdf]
- [arXiv 2024] RJUA-MedDQA: A Multimodal Benchmark for Medical Document Question Answering and Clinical Reasoning [pdf]
- [arXiv 2024] Design as Desired: Utilizing Visual Question Answering for Multimodal Pre-training [pdf]
- [arXiv 2024] Worse than Random? An Embarrassingly Simple Probing Evaluation of Large Multimodal Models in Medical VQA [pdf] [code]
- [IF 2024] Surgical-VQLA++: Adversarial Contrastive Learning for Calibrated Robust Visual Question-Localized Answering in Robotic Surgery [pdf] [code]
Medical Vision-Language Model
2022
- [EMNLP 2022] Medclip: Contrastive learning from unpaired medical images and text [pdf] [code]
- [NeurIPSW 2022] Adapting Pretrained Vision-Language Foundational Models to Medical Imaging Domains [pdf]
- [ACL 2022] ViLMedic: a framework for research at the intersection of vision and language in medical AI [pdf] [code]
- [MICCAI 2022] Multi-modal Masked Autoencoders for Medical Vision-and-Language Pre-training [pdf] [code]
- [JBHI 2022] Multi-Modal Understanding and Generation for Medical Images and Text via Vision-Language Pre-Training [pdf] [code]
- [AAAI 2022] Clinical-BERT: Vision-Language Pre-training for Radiograph Diagnosis and Reports Generation [pdf]
- [JBHI 2022] Vision-language transformer for interpretable pathology visual question answering [link]
- [arXiv 2022] RoentGen: Vision-Language Foundation Model for Chest X-ray Generation [pdf]
- [ECCV 2022] Making the most of text semantics to improve biomedical visionālanguage processing [pdf]
- [MICCAI 2022] RepsNet: Combining Vision with Language for Automated Medical Reports [pdf] [code]
- [NeurIPS 2022] Multi-Granularity Cross-modal Alignment for Generalized Medical Visual Representation Learning [pdf] [code]
- [MICCAI 2022] Berthop: An effective vision-and-language model for chest x-ray disease diagnosis [pdf]
2023
- [TMI 2023] LViT: Language meets Vision Transformer in Medical Image Segmentation [pdf] [code]
- [ICCV 2023] Towards Unifying Medical Vision-and-Language Pre-training via Soft Prompts [pdf] [code]
- [ICCV 2023] CLIP-Driven Universal Model for Organ Segmentation and Tumor Detection [pdf] [code]
- [arXiv 2023] Towards General Purpose Medical AI: Continual Learning Medical Foundation Model [pdf]
- [arXiv 2023] Large-Scale Domain-Specific Pretraining for Biomedical Vision-Language Processing [pdf] [code]
- [ICLR 2023] Medical Image Understanding with Pretrained Vision Language Models: A Comprehensive Study [pdf] [code]
- [ICLR 2023] Advancing Radiograph Representation Learning with Masked Record Modeling [pdf] [code]
- [MICCAI 2023] PMC-CLIP: Contrastive Language-Image Pre-training using Biomedical Documents [pdf]
- [arXiv 2023] ChatCAD: Interactive Computer-Aided Diagnosis on Medical Image using Large Language Models [pdf][code]
- [ICCV 2023] MedKLIP: Medical Knowledge Enhanced Language-Image Pre-Training [pdf] [project]
- [CVPR 2023] Learning to Exploit Temporal Structure for Biomedical Vision-Language Processing [pdf]
- [CVPRW 2023] One-shot and Partially-Supervised Cell Image Segmentation Using Small Visual Prompt [pdf]
- [MICCAI 2023] CLIP-Lung: Textual Knowledge-Guided Lung Nodule Malignancy Prediction [pdf]
- [MICCAI 2023] UniSeg: A Prompt-driven Universal Segmentation Model as well as A Strong Representation Learner [pdf] [code]
- [ICCV 2023] UniverSeg: Universal Medical Image Segmentation [pdf] [project website]
- [ICCV 2023] LIMITR: Leveraging Local Information for Medical Image-Text Representation [pdf] [code]
- [arXiv 2023] XrayGPT: Chest Radiographs Summarization using Medical Vision-Language Models [pdf] [code]
- [arXiv 2023] BiomedGPT: A Unified and Generalist Biomedical Generative Pre-trained Transformer for Vision, Language, and Multimodal Tasks [pdf] [code]
- [CHIL 2023] Multi-modal Pre-training for Medical Vision-language Understanding and Generation: An Empirical Study with A New Benchmark [pdf] [code]
- [NeurIPS 2023] Med-UniC: Unifying Cross-Lingual Medical Vision-Language Pre-Training by Diminishing Bias [pdf]
- [arXiv 2023] OphGLM: Training an Ophthalmology Large Language-and-Vision Assistant based on Instructions and Dialogue [pdf] [code]
- [ICMLW 2023] A ChatGPT Aided Explainable Framework for Zero-Shot Medical Image Diagnosis [pdf]
- [MICCAI 2023] M-FLAG: Medical Vision-Language Pre-training with Frozen Language Models and Latent Space Geometry Optimization [pdf] [code]
- [arXiv 2023] Towards Generalist Biomedical AI [pdf] [Med-PaLM]
- [MICCAI 2023] Knowledge Boosting: Rethinking Medical Contrastive Vision-Language Pre-Training [pdf] [code]
- [MICCAI 2023] Unified Medical Image-Text-Label Contrastive Learning With Continuous Prompt [pdf]
- [arXiv 2023] Few-shot medical image classification with simple shape and texture text descriptors using vision-language models [pdf] [code]
- [ICMLW 2023] Med-Flamingo: a Multimodal Medical Few-shot Learner [pdf] [code]
- [MICCAI 2023] Ariadne's Thread: Using Text Prompts to Improve Segmentation of Infected Areas from Chest X-ray images [pdf] [code]
- [arXiv 2023] A Foundation LAnguage-Image model of the Retina (FLAIR): Encoding expert knowledge in text supervision [pdf] [code]
- [ICCV 2023] ViLLA: Fine-Grained Vision-Language Representation Learning from Real-World Data [pdf] [code]
- [arXiv 2023] IMITATE: Clinical Prior Guided Hierarchical Vision-Language Pre-training [pdf]
- [arXiv 2023] Utilizing Synthetic Data for Medical Vision-Language Pre-training: Bypassing the Need for Real Images [pdf]
- [arXiv 2023] RaDialog: A Large Vision-Language Model for Radiology Report Generation and Conversational Assistance [pdf] [code]
- [MICCAI 2023] CXR-CLIP: Toward Large Scale Chest X-ray Language-Image Pre-training [pdf] [code]
- [MICCAI 2023] Medical Phrase Grounding with Region-Phrase Context Contrastive Alignment [pdf] [code]
- [arXiv 2023] BiomedJourney: Counterfactual Biomedical Image Generation by Instruction-Learning from Multimodal Patient Journeys [pdf] [project]
- [arXiv 2023] Qilin-Med-VL: Towards Chinese Large Vision-Language Model for General Healthcare [pdf] [code]
- [NeurIPS 2023] LLaVA-Med: Training a Large Language-and-Vision Assistant for Biomedicine in One Day [pdf] [code]
- [arXiv 2023] Towards Generalist Foundation Model for Radiology by Leveraging Web-scale 2D&3D Medical Data [pdf] [code]
- [arXiv 2023] RO-LLaMA: Generalist LLM for Radiation Oncology via Noise Augmentation and Consistency Regularization [pdf]
- [arXiv 2023] MedXChat: Bridging CXR Modalities with a Unified Multimodal Large Model [pdf]
- [arXiv 2023] G2D: From Global to Dense Radiography Representation Learning via Vision-Language Pre-training [pdf]
- [npj digital medicine 2023] A medical multimodal large language model for future pandemics [pdf]
- [arXiv 2023] A Foundational Multimodal Vision Language AI Assistant for Human Pathology [pdf]
- [arXiv 2023] ECAMP: Entity-centered Context-aware Medical Vision Language Pre-training [pdf] [code]
- [Nature Medicine 2023] A visualālanguage foundation model for pathology image analysis using medical Twitter [pdf] [code]
2024
- [CVPR 2024] Quilt-LLaVA: Visual Instruction Tuning by Extracting Localized Narratives from Open-Source Histopathology Videos [pdf] [code-demo]
- [ICASSP 2024] Freeze the backbones: A Parameter-Efficient Contrastive Approach to Robust Medical Vision-Language Pre-training [pdf]
- [arXiv 2024] Vulnerabilities Unveiled: Adversarially Attacking a Multimodal Vision Language Model for Pathology Imaging [pdf]
- [arXiv 2024] Masked Contrastive Reconstruction for Cross-modal Medical Image-Report Retrieval [pdf]
- [arXiv 2024] CheXagent: Towards a Foundation Model for Chest X-Ray Interpretation [pdf] [code]
- [TMM 2024] UniDCP: Unifying Multiple Medical Vision-language Tasks via Dynamic Cross-modal Learnable Prompts [pdf]
- [CVPR 2024] OmniMedVQA: A New Large-Scale Comprehensive Evaluation Benchmark for Medical LVLM [pdf]
- [CVPR 2024] Adapting Visual-Language Models for Generalizable Anomaly Detection in Medical Images [pdf] [code]
- [ICLR 2024] LLM-CXR: Instruction-Finetuned LLM for CXR Image Understanding and Generation [pdf] [code]
- [arXiv 2024] Enhancing Human-Computer Interaction in Chest X-ray Analysis using Vision and Language Model with Eye Gaze Patterns [pdf]
- [arXiv 2024] DeViDe: Faceted medical knowledge for improved medical vision-language pre-training [pdf]
- [arXiv 2024] M3D: Advancing 3D Medical Image Analysis with Multi-Modal Large Language Models [pdf] [code]
- [arXiv 2024] Dia-LLaMA: Towards Large Language Model-driven CT Report Generation [pdf]
- [arXiv 2024] WoLF: Wide-scope Large Language Model Framework for CXR Understanding [pdf]
- [CVPR 2024] Decomposing Disease Descriptions for Enhanced Pathology Detection: A Multi-Aspect Vision-Language Pre-training Framework [pdf] [code]
- [arXiv 2024] Large Model driven Radiology Report Generation with Clinical Quality Reinforcement Learning [pdf]
- [arXiv 2024] MedRG: Medical Report Grounding with Multi-modal Large Language Model [pdf]
- [CVPR 2024] Bootstrapping Chest CT Image Understanding by Distilling Knowledge from X-ray Expert Models [pdf] [code]
- [CVPR 2024] Continual Self-supervised Learning: Towards Universal Multi-modal Medical Data Representation Learning [pdf] [code]
- [CVPR 2024] PairAug: What Can Augmented Image-Text Pairs Do for Radiology? [pdf] [code]
- [CVPR 2024] MLIP: Enhancing Medical Visual Representation with Divergence Encoder and Knowledge-guided Contrastive Learning [pdf]
- [Nature Medicine 2024] A visual-language foundation model for computational pathology [pdf] [code]
- [Nature Medicine 2024] Visionālanguage foundation model for echocardiogram interpretation [pdf] [code]
- [TMI 2024] ChatCAD+: Towards a Universal and Reliable Interactive CAD using LLMs [pdf][code]
- [arXiv 2024] MedDr: Diagnosis-Guided Bootstrapping for Large-Scale Medical Vision-Language Learning [pdf] [code]
- [NeurIPS 2024] CARES: A Comprehensive Benchmark of Trustworthiness in Medical Vision Language Models [pdf] [code]
- [MIDL 2024] Exploring Transfer Learning in Medical Image Segmentation using Vision-Language Models [pdf] [code]
- [arXiv 2024] Surgical-LVLM: Learning to Adapt Large Vision-Language Model for Grounded Visual Question Answering in Robotic Surgery [pdf] [code]
- [arXiv 2024] Dr-LLaVA: Visual Instruction Tuning with Symbolic Clinical Grounding [pdf] [code]
- [arXiv 2024] Merlin: A Vision Language Foundation Model for 3D Computed Tomography [pdf]
- [arXiv 2024] Advancing High Resolution Vision-Language Models in Biomedicine [pdf] [code]
- [EMNLP 2024] HuatuoGPT-Vision, Towards Injecting Medical Visual Knowledge into Multimodal LLMs at Scale [pdf] [code]
- [EMNLP 2024] STLLaVA-Med: Self-Training Large Language and Vision Assistant for Medical [pdf] [code]
- [EMNLP 2024] RULE: Reliable Multimodal RAG for Factuality in Medical Vision Language Models [pdf] [code]
- [MICCAI 2024] CLIP-DR: Textual Knowledge-Guided Diabetic Retinopathy Grading with Ranking-aware Prompting [pdf] [code]
- [arXiv 2024] PA-LLaVA: A Large Language-Vision Assistant for Human Pathology Image Understanding [pdf] [code]
- [arXiv 2024] LLaVA-Surg: Towards Multimodal Surgical Assistant via Structured Surgical Video Learning [pdf]
- [arXiv 2024] GMAI-MMBench: A Comprehensive Multimodal Evaluation Benchmark Towards General Medical AI [pdf] [code]
- [arXiv 2024] MedTrinity-25M: A Large-scale Multimodal Dataset with Multigranular Annotations for Medicine [pdf] [code]
- [arXiv 2024] VisionUnite: A Vision-Language Foundation Model for Ophthalmology Enhanced with Clinical Knowledge [pdf] [code]
- [arXiv 2024] GP-VLS: A general-purpose vision language model for surgery [pdf] [code]
- [arXiv 2024] Specialist vision-language models for clinical ophthalmology [pdf]
- [arXiv 2024] MiniGPT-Med: Large Language Model as a General Interface for Radiology Diagnosis [pdf] [code]
- [arXiv 2024] MedVH: Towards Systematic Evaluation of Hallucination for Large Vision Language Models in the Medical Context [pdf] [code]
- [arXiv 2024] Med-PMC: Medical Personalized Multi-modal Consultation with a Proactive Ask-First-Observe-Next Paradigm [pdf]
- [arXiv 2024] LOGRA-MED: Long Context Multi-Graph Alignment For Medical Vision-Language Model [pdf]
- [arXiv 2024] WorldMedQA-V: a multilingual, multimodal medical examination dataset for multimodal language models evaluation [pdf] [code]
- [arXiv 2024] VividMed: Vision Language Model with Versatile Visual Grounding for Medicine [pdf] [code]
- [arXiv 2024] Preference Fine-Tuning for Factuality in Chest X-Ray Interpretation Models Without Human Feedback [pdf]
- [arXiv 2024] MedViLaM: A multimodal large language model with advanced generalizability and explainability for medical data understanding and generation [pdf]
- [arXiv 2024] MMed-RAG: Versatile Multimodal RAG System for Medical Vision Language Models [pdf] [code]
- [arXiv 2024] Interpretable Bilingual Multimodal Large Language Model for Diverse Biomedical Tasks [pdf] [code]
- [arXiv 2024] E3D-GPT: Enhanced 3D Visual Foundation for Medical Vision-Language Model [pdf]
- [NeurIPS 2024] BenchX: A Unified Benchmark Framework for Medical Vision-Language Pretraining on Chest X-Rays [pdf] [code]
- [EMNLP 2024] Medical Adaptation of Large Language and Vision-Language Models: Are We Making Progress? [pdf] [code]
- [arXiv 2024] Mini-InternVL: A Flexible-Transfer Pocket Multimodal Model with 5% Parameters and 90% Performance [pdf] [code]
- [arXiv 2024] SemiHVision: Enhancing Medical Multimodal Models with a Semi-Human Annotated Dataset and Fine-Tuned Instruction Generation [pdf] [code]
š Contribution
Contributing to this paper list
ā" Join us in improving this repository! If you know of any important works we've missed, please contribute. Your efforts are highly valued! "