Awesome
SimMAT: Exploring Transferability from Vision Foundation Models to Any Image Modality
- Authors: Chengyang Lei, Liyi Chen, Jun Cen, Xiao Chen, Zhen Lei, Felix Heide, Ziwei Liu, Qifeng Chen, Zhaoxiang Zhang
SimMAT aims to transfer the ability of large RGB-based models to other modalities (e.g., Depth, Thermal, Polarization), which suffering from limited training data. For example,SimMAT enable the Segment Anything Model the ability to handle modality beyond RGB images.
<p align="center"> <img src="resources/overview.png" width="95%"> </p><a name="GettingStarted"></a>Getting Started
Firstly, prepare the project and create the environment.
git clone https://github.com/mt-cly/SimMAT
cd SimMAT
conda create -n simmat python=3.10
conda activate simmat
pip install -r requirements.txt
# pretrained SAM-B
wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_b_01ec64.pth
mv sam_vit_b_01ec64.pth checkpoint/sam
We provide segmentation benchmark to study the segmentation performance in various modalities.
Dataset | Supporting Modalities | Link |
---|---|---|
IVRG_RGBNIR | NIR, NIR+RGB | download(1.0G) |
RGB-Thermal-Glass | Thermal, Thermal+RGB | download(3.0G) |
NYUDepthv2 | Depth, HHA, Depth+RGB, HHA+RGB | download(1.6G) |
pgsnet | AOLP+DOLP, AOLP+DOLP+RGB | download(15.5G) |
zju-rgbp | AOLP+DOLP, AOLP+DOLP+RGB | download(0.3G) |
You can download one or all benchmark from given links, unzip and move them to the data
folder, the file structure should be as follows.
--SimMAT
|--data
|--IVRG_RGBNIR
|--NYUDepthv2
|--pgsnet
|--RGB-Thermal-Glass
|--zju-rgbp
You can simply execute python train.py
followed by optional arguments.
-net # specify the tuning methods. Options: {sam_full_finetune, sam_linear_probing, sam_mlp_adapter, sam_lora, sam_prompt}
-modality # modality name. Options:{pgsnet_rgbp, pgsnet_p, rgbd, d, rgbhha, hha, nir, rgbnir, rgbt, t,zju-rgbp}
-proj_type # the pre-projection before foundation model Options: {simmat, baseline_a, baseline_b}
-exp_name # the experiment name
-val_freq # interval epochs between each validation. Default: 5
-b # batch size. Default: 4
-lr # learning rate. It is suggested to set 3e-4 for PEFT, 3e-5 for Full Finetuning
-weights # the path to trained weights you want to resume
If you want to use DDP, just add extra -ddp
to the command.
We provide an example command to perform adapting SAM to NIR modality in train.sh
.
sh train.sh
Citation
@article{lei2024simmat,
title={SimMAT: Exploring Transferability from Vision Foundation Models to Any Image Modality},
author={Lei, Chengyang and Chen, Liyi and Cen, Jun and Chen, Xiao and Lei, Zhen and Heide, Felix and Liu, Ziwei and Chen, Qifeng and Zhang, Zhaoxiang},
journal={arXiv preprint arXiv:2409.08083},
year={2024}
}
Acknowledgements
The code is based on Medical-SAM-Adapter.