Awesome

SimMAT: Exploring Transferability from Vision Foundation Models to Any Image Modality

Authors: Chengyang Lei, Liyi Chen, Jun Cen, Xiao Chen, Zhen Lei, Felix Heide, Ziwei Liu, Qifeng Chen, Zhaoxiang Zhang

SimMAT aims to transfer the ability of large RGB-based models to other modalities (e.g., Depth, Thermal, Polarization), which suffering from limited training data. For example,SimMAT enable the Segment Anything Model the ability to handle modality beyond RGB images.

<a name="GettingStarted"></a>Getting Started

Firstly, prepare the project and create the environment.

git clone https://github.com/mt-cly/SimMAT
cd SimMAT
conda create -n simmat python=3.10
conda activate simmat
pip install -r requirements.txt
# pretrained SAM-B 
wget https://dl.fbaipublicfiles.com/segment_anything/sam_vit_b_01ec64.pth
mv sam_vit_b_01ec64.pth checkpoint/sam

We provide segmentation benchmark to study the segmentation performance in various modalities.

Dataset	Supporting Modalities	Link
IVRG_RGBNIR	NIR, NIR+RGB	download(1.0G)
RGB-Thermal-Glass	Thermal, Thermal+RGB	download(3.0G)
NYUDepthv2	Depth, HHA, Depth+RGB, HHA+RGB	download(1.6G)
pgsnet	AOLP+DOLP, AOLP+DOLP+RGB	download(15.5G)
zju-rgbp	AOLP+DOLP, AOLP+DOLP+RGB	download(0.3G)

You can download one or all benchmark from given links, unzip and move them to the data folder, the file structure should be as follows.

--SimMAT
   |--data
     |--IVRG_RGBNIR
     |--NYUDepthv2
     |--pgsnet
     |--RGB-Thermal-Glass
     |--zju-rgbp

You can simply execute python train.py followed by optional arguments.

  -net         # specify the tuning methods. Options: {sam_full_finetune, sam_linear_probing, sam_mlp_adapter, sam_lora, sam_prompt}
  -modality    # modality name. Options:{pgsnet_rgbp, pgsnet_p, rgbd, d, rgbhha, hha, nir, rgbnir, rgbt, t,zju-rgbp}
  -proj_type   # the pre-projection before foundation model Options: {simmat, baseline_a, baseline_b}
  -exp_name    # the experiment name
  -val_freq    # interval epochs between each validation. Default: 5
  -b           # batch size. Default: 4
  -lr          # learning rate. It is suggested to set 3e-4 for PEFT, 3e-5 for Full Finetuning
  -weights     # the path to trained weights you want to resume

If you want to use DDP, just add extra -ddp to the command.

We provide an example command to perform adapting SAM to NIR modality in train.sh.

sh train.sh

Citation

@article{lei2024simmat,
  title={SimMAT: Exploring Transferability from Vision Foundation Models to Any Image Modality},
  author={Lei, Chengyang and Chen, Liyi and Cen, Jun and Chen, Xiao and Lei, Zhen and Heide, Felix and Liu, Ziwei and Chen, Qifeng and Zhang, Zhaoxiang},
  journal={arXiv preprint arXiv:2409.08083},
  year={2024}
}

Acknowledgements

The code is based on Medical-SAM-Adapter.