Awesome

Medical Image Understanding with Pretrained Vision Language Models: A Comprehensive Study

<div align="center"> <a href="https://"><img width="1000px" height="auto" src="figures/framework.PNG"></a> </div>

PyPI - Python Version

Loc

Updated on 2023.06.08

Introduction

This is a repository for the ICLR2023 accepted paper -- Medical Image Understanding with Pretrained Vision Language Models: A Comprehensive Study.

We propose a paradigm of prompt desgining that including expressive attributes into the prompts. We show that such prompts can help pre-trained Visual-Lanugage Models(VLM) rapidly adapt to unseen medical domain datasets.
We further propose three different approaches for automatical prompts generation, leveraging either specialized Language Models(LM) or VQA models to obtain.
Our methods are evaluated by various public medical datasets. For more detail please refer to the paper.

ZeroShot Results

<div align="center"> <a href="https://"><img width="1000px" height="auto" src="figures/zeroshot_res.PNG"></a> </div> Our methods show superiority under zero-shot settings. <!-- ## Links

Dataset

Due to the license factor, we can not share all the datasets we used in our work, but we upload the polyp benchmark datasets as sample. The polyp datasets are prepared by PraNet project, you can also download the data here. If someone wants to use their own dataset, please refer to the polyp datasets to organize their data paths and annotation files.

Netdisk Type	Link	Password(optional)
BaiduNetDisk	link	s2nf
Google Drive	link	N/A

After you download this zip file, please unzip it and place the folder at the project path.

DEMO

Interface

We also provide a interface space on huggingface for quick interaction with our approach. Please check this link for the interactable demo page.

Colab file

You can also check this Colab script for code and training detail.

Get Started

Main Requirements
Our project is based on the GLIP project, so please first setup the environment for the GLIP model following this instruction. Please make sure you download the GLIP-T Model weight here and put it under the MODEL/ path. Next, please clone this repository and continue the installation guide in the next section.

Installation

git clone https://github.com/MembrAI/MIU-VL.git
pip install -r requirements.txt

Configuration Files

We follow the config file format used in the GLIP project. Please refer to the sample config file we provided to create your own config file. Note: The DATASETS.CAPTION_PROMPT content is ignore by our code, as our code use the automatically generated code instead of user inputted prompt.

Zero-shot Inference set-up guide

Generate prompts with Masked Language Model(MLM) method In our work, we proposed three different methods to automatically generate prompts with expressive attributes. The first approach is the MLM method. To generate prompts with this approach, we need to use the pre-trained Language Models as our knowledge source. In this project, we use the BiomedNLP-PubmedBERT-base model as our specialized language model. Please use the following code to download the model to this repo:

git lfs install
git clone https://huggingface.co/microsoft/BiomedNLP-PubMedBERT-base-uncased-abstract-fulltext

After you setup the depedencies we need for automatically generate prompts with MLM method, you can now generate prompts for your dataset. However, our code currently only support generate prompts with three expressive attributes -- color, shape, and location. We may improve our code in the future to support more kinds of attributes, but we found the three attributes we included are the most useful attributes for now.

Now, run the following codes to get the prompts generated with our MLM method.

bash RUN/autoprompt/make_auto_mlm.sh

python make_autopromptsv2.py --dataset 'kvasir' \
      --cls_names 'polyp' \
      --vqa_names 'wound'\
      --mode 'lama'\
      --real_cls_names 'bump'

where --dataset is the dataset name that will be used for searching related data paths. --cls_names indicates the class name in the template that used to extract the attributes information from the language model. For example, in this case, we will ask the LM to predict the masked word in the following template:"The typical color of polyp is [MASK] color ". Then the LM will predict the MASK token considering the given class name. --vqa_names is similar to the cls_names above, execept it is used for asking the VQA models later. --mode argument decide which approach of automated generation will be used, and 'lama' refers to the MLM method. Finally, real_cls_names is the real class name that you will put into the prompt. Sometimes, we found substitude the terminologies with general vocabularies may imrpove the performance. For example, we use bump, instead of polyp, in our final prompts, and we observe a significant improvement.

After running the command before, you will receive several .json files saved in the 'autoprompt_json/' folder. These json files stored all the generated prompts for each image input. To run the final inferece code, please type the following codes:

#!/bin/bash
config_file=path/to/config/file.yaml
odinw_configs=path/to/config/file.yaml
output_dir=output/path
model_checkpoint=MODEL/glip_tiny_model_o365_goldg.pth
jsonFile=autoprompt_json/lama_kvasir_path_prompt_top1.json

python test_vqa.py --json ${jsonFile} \
      --config-file ${config_file} --weight ${model_checkpoint} \
      --task_config ${odinw_configs} \
      OUTPUT_DIR ${output_dir}\
      TEST.IMS_PER_BATCH 2 SOLVER.IMS_PER_BATCH 2 \
      TEST.EVAL_TASK detection \
      DATASETS.TRAIN_DATASETNAME_SUFFIX _grounding \
      DATALOADER.DISTRIBUTE_CHUNK_AMONG_NODE False \
      DATASETS.USE_OVERRIDE_CATEGORY True \
      DATASETS.USE_CAPTION_PROMPT True\

Generate image-specific prompts with VQA and Hybrid method Our approach need to use the OFA model for Visual-question answering tasks, and thus you need to follow this guide to intall the OFA module with huggingface transformers. Note: We use the OFA-base model in this project. For you convenience, you can simply run the following code to install the OFA model with huggingface transformers. But we recommend you to refer to the user guide in case there is any problem.

git clone --single-branch --branch feature/add_transformers https://github.com/OFA-Sys/OFA.git
pip install OFA/transformers/
git clone https://huggingface.co/OFA-Sys/OFA-base

Here, we will show how to generate auto-prompts json files with Hybrid methods.

python make_autopromptsv2.py --dataset 'kvasir' \
      --cls_names 'polyp' \
      --vqa_names 'bump'\
      --mode 'hybrid'\
      --real_cls_names 'bump'

or run the pre-defined bash file

bash RUN/autoprompt/make_auto_hybrid.sh

As we mentioned above, --mode argument decide which approach for prompts generation. 'hybird' and 'vqa' will activate the hybrid or vqa method respectively Note: runing the hybrid or vqa method will take hours to obtain the prompt file. We recommend to use a GPU with at least 24GB Memory to run this script

Again, you will obtain several json files which have the autoprompts generated by our approach wrt each image input. If you can not run the script above due to the GPU limitation, we also provided some sample files under the autoprompt_json path. You can use these files as references and run the following code to do the inference with generated prompts:

config_file=path/to/config/file.yaml
odinw_configs=path/to/config/file.yaml
output_dir=output/path
model_checkpoint=MODEL/glip_tiny_model_o365_goldg.pth
jsonFile=autoprompt_json/hybrid_kvasir_path_prompt_top1.json

python test_vqa.py --json ${jsonFile} \
      --config-file ${config_file} --weight ${model_checkpoint} \
      --task_config ${odinw_configs} \
      OUTPUT_DIR ${output_dir}\
      TEST.IMS_PER_BATCH 2 SOLVER.IMS_PER_BATCH 2 \
      TEST.EVAL_TASK detection \
      DATASETS.TRAIN_DATASETNAME_SUFFIX _grounding \
      DATALOADER.DISTRIBUTE_CHUNK_AMONG_NODE False \
      DATASETS.USE_OVERRIDE_CATEGORY True \
      DATASETS.USE_CAPTION_PROMPT True\

Fine-Tuning

We also finetuned the GLIP model with the medical data we collected. For convenience, we uploaded all the check pointed here for people who want to replicate our results.

Checkpoints

Non-radiology Checkpoints

DataSet	Weights	PASSWORD
Polyp	Link	1f8e
CPM17	Link	ywyc
BCCD	Link	4wrb
ISIC2016	Link	j7fc
DFUC2020	Link	pbir

Radiology Dataset and Checkpoints

DataSet	Weights	PASSWORD
LUNA16	Link	tg5h
ADNI	Link	dptg
TN3k	Link	596i
TBX11k	Link	tv9s

Fine-Tune Results

🙋‍♀️ Feedback and Contact

Emai: placeholder@tmp.com

🛡️ License

This project is under the CC-BY-NC 4.0 license. See LICENSE for details.

🙏 Acknowledgement

Our code is adapted from the GLIP project. And we also use the OFA and PubMedBert for auto-prompt generation. Thanks for their execellent works.

📝 Citation

If you find this repository useful, please consider citing this paper:

@article{Qin2022MedicalIU,
  title={Medical Image Understanding with Pretrained Vision Language Models: A Comprehensive Study},
  author={Ziyuan Qin and Huahui Yi and Qicheng Lao and Kang Li},
  journal={ArXiv},
  year={2022},
  volume={abs/2209.15517}
}