Home

Awesome

MedCLIP-SAMv2: Towards Universal Text-Driven Medical Image Segmentation

Health-X Lab | IMPACT Lab

Taha Koleilat, Hojat Asgariandehkordi, Hassan Rivaz, Yiming Xiao

paper Overview Datasets Demo BibTeX

Overview

Abstract: Segmentation of anatomical structures and pathological regions in medical images is essential for modern clinical diagnosis, disease research, and treatment planning. While significant advancements have been made in deep learning-based segmentation techniques, many of these methods still suffer from limitations in data efficiency, generalizability, and interactivity. As a result, developing precise segmentation methods that require fewer labeled datasets remains a critical challenge in medical image analysis. Recently, the introduction of foundation models like CLIP and Segment-Anything-Model (SAM), with robust cross-domain representations, has paved the way for interactive and universal image segmentation. However, further exploration of these models for data-efficient segmentation in medical imaging is still needed and highly relevant. In this paper, we introduce MedCLIP-SAMv2, a novel framework that integrates the CLIP and SAM models to perform segmentation on clinical scans using text prompts, in both zero-shot and weakly supervised settings. Our approach includes fine-tuning the BiomedCLIP model with a new Decoupled Hard Negative Noise Contrastive Estimation (DHN-NCE) loss, and leveraging the Multi-modal Information Bottleneck (M2IB) to create visual prompts for generating segmentation masks from SAM in the zero-shot setting. We also investigate using zero-shot segmentation labels within a weakly supervised paradigm to enhance segmentation quality further. Extensive testing across four diverse segmentation tasks and medical imaging modalities (breast tumor ultrasound, brain tumor MRI, lung X-ray, and lung CT) demonstrates the high accuracy of our proposed framework.

Framework

<p float="left"> <img src="assets/MedCLIP-SAMv2.png" width="100%" /> </p>

Sample Segmentation Results

<p float="left"> <img src="assets/SegExamples.png" width="100%" /> </p>

Datasets

Public datasets used in our study:

Create a directory for your data that you want to work with in the main working directory like the following:

data
├── breast_tumors
│   ├── images           
│   ├── masks             
│   ├── val_images        
│   ├── val_masks         
│   ├── test_images       
│   └── test_masks        
│
├── brain_tumors
│   ├── images            
│   ├── masks            
│   ├── val_images        
│   ├── val_masks         
│   ├── test_images       
│   └── test_masks        
│
└── ...        

Colab Demo

Interactive Colab demo: Open In Colab

Prerequisites & Installation

Install anaconda following the anaconda installation documentation. Create an environment with all required packages with the following command :

conda env create -f medclipsamv2_env.yml
conda activate medclipsamv2

then setup the segment-anything library:

cd segment-anything
pip install -e .
cd ..

finally setup the nnUNet framework:

cd weak_segmentation
pip install -e .
cd ..

<a name="Models"></a>SAM Model Checkpoints

Three model versions of the SAM model are available with different backbone sizes. These models can be instantiated by running

Click the links below to download the checkpoint for the corresponding model type and place it at segment-anything/sam_checkpoints/sam_vit_h_4b8939.pth

How to run

DHN-NCE Loss

You can fine-tune the BiomedCLIP pre-trained model using our DHN-NCE Loss.

Our fine-tuned model can be downloaded here. Place it at saliency_maps/model/pytorch_model.bin

Zero-shot Segmentation

You can run the whole zero-shot framework with the following:

bash zeroshot.sh <path/to/dataset>

You can change the settings by specifying which CLIP model you want to use, the post-processing algorithm, the SAM model and the type of visual prompts to use (boxes/points/both).

Weakly Supervised Segmentation

Go to weak_segmentation:

cd weak_segmentation

Dataset Prepartion

Please follow this guideline to prepare your datasets. Place all your prepared datasets in data.

Preprocessing

nnUNetv2_plan_and_preprocess -d DATASET_ID --verify_dataset_integrity

Training

nnUNetv2_train DATASET_ID 2d all --npz --num_epochs EPOCHS --num_of_cycles CYCLES

Inference and Uncertainty

nnUNetv2_predict_from_folder --dataset DATASET_ID --fold all --input_folder INPUT_PATH --output_folder OUTPUT_PATH --rule RULE
nnUNetv2_run_uncertainty_on_fold --proba_dir PATH --raw_path PATH --labels PATH --score_type TYPE --output_pred_path PATH

Acknowledgements

Special thanks to open_clip, M2IB, nnUNet, and segment-anything for making their valuable code publicly available.

Citation

If you use MedCLIP-SAM, please consider citing:

@article{koleilat2024medclipsamv2,
  title={MedCLIP-SAMv2: Towards Universal Text-Driven Medical Image Segmentation},
  author={Koleilat, Taha and Asgariandehkordi, Hojat and Rivaz, Hassan and Xiao, Yiming},
  journal={arXiv preprint arXiv:2409.19483},
  year={2024}
}

@inproceedings{koleilat2024medclip,
  title={MedCLIP-SAM: Bridging text and image towards universal medical image segmentation},
  author={Koleilat, Taha and Asgariandehkordi, Hojat and Rivaz, Hassan and Xiao, Yiming},
  booktitle={International Conference on Medical Image Computing and Computer-Assisted Intervention},
  pages={643--653},
  year={2024},
  organization={Springer}
}