Awesome
MedCLIP-SAMv2: Towards Universal Text-Driven Medical Image Segmentation
Taha Koleilat, Hojat Asgariandehkordi, Hassan Rivaz, Yiming Xiao
Overview
Abstract: Segmentation of anatomical structures and pathological regions in medical images is essential for modern clinical diagnosis, disease research, and treatment planning. While significant advancements have been made in deep learning-based segmentation techniques, many of these methods still suffer from limitations in data efficiency, generalizability, and interactivity. As a result, developing precise segmentation methods that require fewer labeled datasets remains a critical challenge in medical image analysis. Recently, the introduction of foundation models like CLIP and Segment-Anything-Model (SAM), with robust cross-domain representations, has paved the way for interactive and universal image segmentation. However, further exploration of these models for data-efficient segmentation in medical imaging is still needed and highly relevant. In this paper, we introduce MedCLIP-SAMv2, a novel framework that integrates the CLIP and SAM models to perform segmentation on clinical scans using text prompts, in both zero-shot and weakly supervised settings. Our approach includes fine-tuning the BiomedCLIP model with a new Decoupled Hard Negative Noise Contrastive Estimation (DHN-NCE) loss, and leveraging the Multi-modal Information Bottleneck (M2IB) to create visual prompts for generating segmentation masks from SAM in the zero-shot setting. We also investigate using zero-shot segmentation labels within a weakly supervised paradigm to enhance segmentation quality further. Extensive testing across four diverse segmentation tasks and medical imaging modalities (breast tumor ultrasound, brain tumor MRI, lung X-ray, and lung CT) demonstrates the high accuracy of our proposed framework.
Framework
<p float="left"> <img src="assets/MedCLIP-SAMv2.png" width="100%" /> </p>Sample Segmentation Results
<p float="left"> <img src="assets/SegExamples.png" width="100%" /> </p>Datasets
Public datasets used in our study:
- Radiology Objects in COntext (ROCO)
- MedPix
- Breast UltraSound Images (BUSI)
- UDIAT
- COVID-QU-Ex
- Brain Tumors
- Lung CT
Create a directory for your data that you want to work with in the main working directory like the following:
data
├── breast_tumors
│ ├── images
│ ├── masks
│ ├── val_images
│ ├── val_masks
│ ├── test_images
│ └── test_masks
│
├── brain_tumors
│ ├── images
│ ├── masks
│ ├── val_images
│ ├── val_masks
│ ├── test_images
│ └── test_masks
│
└── ...
Colab Demo
Prerequisites & Installation
Install anaconda following the anaconda installation documentation. Create an environment with all required packages with the following command :
conda env create -f medclipsamv2_env.yml
conda activate medclipsamv2
then setup the segment-anything library:
cd segment-anything
pip install -e .
cd ..
finally setup the nnUNet framework:
cd weak_segmentation
pip install -e .
cd ..
<a name="Models"></a>SAM Model Checkpoints
Three model versions of the SAM model are available with different backbone sizes. These models can be instantiated by running
Click the links below to download the checkpoint for the corresponding model type and place it at segment-anything/sam_checkpoints/sam_vit_h_4b8939.pth
default
orvit_h
: ViT-H SAM model.vit_l
: ViT-L SAM model.vit_b
: ViT-B SAM model.
How to run
DHN-NCE Loss
You can fine-tune the BiomedCLIP pre-trained model using our DHN-NCE Loss.
Our fine-tuned model can be downloaded here. Place it at saliency_maps/model/pytorch_model.bin
Zero-shot Segmentation
You can run the whole zero-shot framework with the following:
bash zeroshot.sh <path/to/dataset>
You can change the settings by specifying which CLIP model you want to use, the post-processing algorithm, the SAM model and the type of visual prompts to use (boxes/points/both).
Weakly Supervised Segmentation
Go to weak_segmentation
:
cd weak_segmentation
Dataset Prepartion
Please follow this guideline to prepare your datasets. Place all your prepared datasets in data
.
Preprocessing
nnUNetv2_plan_and_preprocess -d DATASET_ID --verify_dataset_integrity
Training
nnUNetv2_train DATASET_ID 2d all --npz --num_epochs EPOCHS --num_of_cycles CYCLES
Inference and Uncertainty
nnUNetv2_predict_from_folder --dataset DATASET_ID --fold all --input_folder INPUT_PATH --output_folder OUTPUT_PATH --rule RULE
nnUNetv2_run_uncertainty_on_fold --proba_dir PATH --raw_path PATH --labels PATH --score_type TYPE --output_pred_path PATH
Acknowledgements
Special thanks to open_clip, M2IB, nnUNet, and segment-anything for making their valuable code publicly available.
Citation
If you use MedCLIP-SAM, please consider citing:
@article{koleilat2024medclipsamv2,
title={MedCLIP-SAMv2: Towards Universal Text-Driven Medical Image Segmentation},
author={Koleilat, Taha and Asgariandehkordi, Hojat and Rivaz, Hassan and Xiao, Yiming},
journal={arXiv preprint arXiv:2409.19483},
year={2024}
}
@inproceedings{koleilat2024medclip,
title={MedCLIP-SAM: Bridging text and image towards universal medical image segmentation},
author={Koleilat, Taha and Asgariandehkordi, Hojat and Rivaz, Hassan and Xiao, Yiming},
booktitle={International Conference on Medical Image Computing and Computer-Assisted Intervention},
pages={643--653},
year={2024},
organization={Springer}
}