Home

Awesome

<div align="center"> MedTrinity-25M: A Large-scale Multimodal Dataset with Multigranular Annotations for Medicine <div>

PWC PWC PWC

<div align="center"> <a href="https://github.com/UCSC-VLAA/MedTrinity-25M"><img src="https://img.shields.io/static/v1?label=MedTrinity-25M Code&message=Github&color=blue&logo=github-pages"></a>   <a href="https://yunfeixie233.github.io/MedTrinity-25M"><img src="https://img.shields.io/static/v1?label=Project%20Page&message=Github&color=blue&logo=github-pages"></a>   <a href="https://huggingface.co/datasets/UCSC-VLAA/MedTrinity-25M"><img src="https://img.shields.io/static/v1?label=MedTrinity-25M&message=HF&color=yellow"></a>   <a href="https://arxiv.org/abs/2408.02900"><img src="https://img.shields.io/static/v1?label=Paper&message=Arxiv:MedTrinity-25M&color=red&logo=arxiv"></a>   </div>

MedTrinity-25M: A Large-scale Multimodal Dataset with Multigranular Annotations for Medicine<br> Yunfei Xie*, Ce Zhou*, Lang Gao*, Juncheng Wu*, Xianhang Li, Hong-Yu Zhou, Sheng Liu, Lei Xing, James Zou, Cihang Xie, Yuyin Zhou


📢 Breaking News

Star 🌟 us if you think it is helpful!!


🚀 Dataset

Dataset construction pipeline

<p align="center"> <img src="images/pipeline.png" width="500"> </p>
  1. Data processing: extracting essential information from collected data, including metadata integration to generate coarse captions, ROI locating, and medical knowledge collection.
  2. Multigranular textual description generation: using this information to prompt MLLMs to generate fine-grained captions.

Statistical overview of MedTrinity-25M

<p align="center"> <img src="images/dataset.png" width="500"> </p>

Statistics of MedTrinity-25M

You can view detailed statistics of MedTrinity-25M from this link.

Note: sometimes a single image contains multiple biological structures. The data only reflect the number of samples in which a specific biological structure is present.

Dataset Download

Dataset🤗 Huggingface Hub
MedTrinity-25MUCSC-VLAA/MedTrinity-25M

🏆 Results

<p align="center"> <img src="images/result.png" width="900"> </p>

💬 Quick Start

Install

Using Linux system,

  1. Clone this repository and navigate to the folder
git clone https://github.com/UCSC-VLAA/MedTrinity-25M.git
  1. Install Package
conda create -n llava-med++ python=3.10 -y
conda activate llava-med++
pip install --upgrade pip  # enable PEP 660 support
pip install -e .
  1. Install additional packages for training cases
pip install -e ".[train]"
pip install flash-attn --no-build-isolation
pip install git+https://github.com/bfshi/scaling_on_scales.git
pip install multimedeval

Upgrade to latest code base

git pull
pip install -e .

# if you see some import errors when you upgrade,
# please try running the command below (without #)
# pip install flash-attn --no-build-isolation --no-cache-dir

🤖 Model-Zoo

The following table provides an overview of the available models in our zoo. For each model, you can find links to its Hugging Face page or Google drive folder.

Model NameLinkSummary
LLaVA-Med++ (VQA-RAD)Google DrivePretrained on LLaVA-Med Data and MedTrinity-25M (specifically the VQA-RAD training set subset), finetuning on VQA-RAD training set.
LLaVA-Med++ (SLAKE)Google DrivePretrained on LLaVA-Med Data and MedTrinity-25M (specifically the SLAKE training set subset), finetuning on SLAKE training set.
LLaVA-Med++ (PathVQA)Google DrivePretrained on LLaVA-Med Data and MedTrinity-25M (specifically the PathVQA training set subset), finetuning on PathVQA training set.
LLaVA-Med-CaptionerHugging FaceCaptioner for generating multigranular annotations fine-tuned on MedTrinity-Instruct-200K (Coming soon).

Train and Eval LLaMA-Med++

First, you need to download the base model LLaVA-Meta-Llama-3-8B-Instruct-FT-S2 and download the stage1 and stage2 datasets in the LLaVA-Med.

  1. Pre-train
# stage1 training
cd MedTrinity-25M
bash ./scripts/med/llava3_med_stage1.sh

# stage2 training
bash ./scripts/med/llava3_med_stage2.sh
  1. Finetune
cd MedTrinity-25M
bash ./scripts/med/llava3_med_finetune.sh
  1. Eval
    First, you need to download corresponding weight from Model-Zoo and change the path in evaluation script. Then run:
cd MedTrinity-25M
bash ./scripts/med/llava3_med_eval_batch_vqa_rad.sh

📜 Citation

If you find MedTrinity-25M useful for your research and applications, please cite using this BibTeX:

@misc{xie2024medtrinity25mlargescalemultimodaldataset,
      title={MedTrinity-25M: A Large-scale Multimodal Dataset with Multigranular Annotations for Medicine}, 
      author={Yunfei Xie and Ce Zhou and Lang Gao and Juncheng Wu and Xianhang Li and Hong-Yu Zhou and Sheng Liu and Lei Xing and James Zou and Cihang Xie and Yuyin Zhou},
      year={2024},
      eprint={2408.02900},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2408.02900}, 
}

🙏 Acknowledgement


Related Projects