Home

Awesome

MMed-RAG: Versatile Multimodal RAG System for Medical Vision Language Models

We introduce MMed-RAG, a powerful multimodal RAG system that boosts the factuality of Medical Vision-Language Models (Med-LVLMs) by up to 43.8%! 🩺     [Paper] [X(Twitter)]

🚀 News

💡 Overview

MMed-RAG enhances alignment across medical domains like radiology, pathology, and ophthalmology with a domain-aware retrieval mechanism. And it tackles three key challenges in alignment of multimodal RAG:

1️⃣ Direct Copy Homework from Others❌ Think it by Self ✅ MMed-RAG helps Med-LVLMs avoid blindly copying external information by encouraging the model to rely on its own visual reasoning when solving complex problems.

2️⃣ Cannot Solve Problems by Self❌ Learn How to Copy ✅ When Med-LVLMs are unsure, MMed-RAG teaches the model to intelligently use retrieved knowledge, pulling in the right information at the right time, boosting accuracy, and reducing errors.

3️⃣ Copied Homework is Wrong❌ Avoid Interference from Incorrect Homework ✅ MMed-RAG prevents models from being misled by incorrect retrievals, reducing the risk of generating inaccurate medical diagnoses.

<div align=left> <img src=asset/logo.png width=90% /> </div>

📦 Requirements

  1. Clone this repository and navigate to MMed-RAG folder
git clone https://github.com/richard-peng-xia/MMed-RAG.git
cd MMed-RAG
  1. Install Package: Create conda environment
conda create -n MMed-RAG python=3.10 -y
conda activate MMed-RAG
pip install --upgrade pip  # enable PEP 660 support
pip install -e .
pip install trl
  1. Download the required model checkpoints LLaVA-Med-1.5 from huggingface.

  2. For all the medical datasets, you need firstly apply for the right of access and then download the dataset.

📖 Data Description

We provide a corresponding json or jsonl file for each dataset, including the image path, question, answer, and original report.

TASK: report/vqa, MODALITY: radiology/pathology/ophthalmology.

📅 Schedule

📚Citation

@article{xia2024mmedrag,
  title={MMed-RAG: Versatile Multimodal RAG System for Medical Vision Language Models},
  author={Xia, Peng and Zhu, Kangyu and Li, Haoran and Wang, Tianze and Shi, Weijia and Wang, Sheng and Zhang, Linjun and Zou, James and Yao, Huaxiu},
  journal={arXiv preprint arXiv:2410.13085},
  year={2024}
}

🙏Acknowledgement

We use code from LLaVA-Med, RULE, CARES. We thank the authors for releasing their code.

<!-- ## Clip Finetune ``` bash ./scripts/retrieve_clip_VQA.sh ``` ## DPO training ``` bash ./scripts/train_dpo_2stages_VQA.sh ``` ## Inference ``` ``` -->