Home

Awesome

Generalized SAM: Efficient Fine-Tuning of SAM for Variable Input Image Sizes (ECCV2024 Workshop)

arXiv

This repo is the official implementation for Generalized SAM accepted by ECCV2024 Workshop Computational Aspects of Deep Learning (CADL).

Highlights

<div align="center"> <img src="figs/img1.png" width="60%"> <img src="figs/img2.png" width="30%"> </div>

Installation

Following Segment Anything, python=3.8.16, pytorch=1.8.0, and torchvision=0.9.0 are used in GSAM.

  1. Clone this repository.
    git clone https://github.com/usagisukisuki/G-SAM.git
    cd G-SAM
    
  2. Install Pytorch and TorchVision. (you can follow the instructions here)
  3. Install other dependencies.
    pip install -r requirements.txt
    

Checkpoints

We use checkpoint of SAM in vit_b version. Please download from SAM and extract them under "models/Pretrained_model".

models
├── Pretrained_model
    ├── sam_vit_b_01ec64.pth

Dataset

Step 1 : Please download from [CamVid], [M-Building], [ISBI2012], [Kvasir-SEG], [Synapse], [Cityscapes], [Trans10k].

Step 2 : please extract them under "Dataset", and make them look like this:

Dataset
├── CamVid
      ├─ train
      ├─ trainannot
      ├─ ...
├── M-building
      ├─ png
          ├─ train
          ├─ train_labels
          ├─ ...
      ├─ tiff
├── ISBI2012
      ├─ Image
      ├─ Label
├── Kvasir
      ├─ datamodel
            ├─ ...
├── Synapse
      ├─ datamodel
            ├─ ...
├── Citycapes
      ├─ gtFine
      ├─ leftImg8bit
├── Trans10k
      ├─ train
      ├─ test
      ├─ val

Fine tuning on SAM

Binary segmentation

If we prepared the binary segmentation dataset (e.g. ISBI2012), we can directly run the following code to train the model with single GPU.

python3 train.py --gpu 0 --dataset 'ISBI2012' --out result_sam --modelname 'SAM' --batchsize 8

If we want to utilize multi GPUs, we can directly run the following code.

CUDA_VISIBLE_DEVICES=0,1 python3 train.py --dataset 'ISBI2012' --out result_sam --modelname 'SAM' --batchsize 8 --multi

Multi-class segmentation

If we prepared the multi-class segmentaiton dataset (e.g. Cityscapes), we can directly run the following code to train the model with single GPU.

python3 train.py --gpu 0 --dataset 'Cityscapes' --out result_sam --modelname 'SAM' --batchsize 8 --num_classes=19 --multimask_output=True

Fine tuning on Generalised SAM

We can try to use our GSAM. Please run the following code to train the improved SAM.

python3 train.py --gpu 0 --dataset 'ISBI2012' --modelname 'GSAM'

Fine tuning on SAM with Anything

We can also try to use variour adaptation methods. Please run the following code to train the improved SAM.

python3 train.py --gpu 0 --dataset 'ISBI2012' --modelname 'SAM_LoRA'
python3 train.py --gpu 0 --dataset 'ISBI2012' --modelname 'SAM_ConvLoRA'
python3 train.py --gpu 0 --dataset 'ISBI2012' --modelname 'SAM_AdaptFormer'
python3 train.py --gpu 0 --dataset 'ISBI2012' --modelname 'SAMUS'

Results

We assessed different types of image data from various domains with varying input image sizes.(In-vehicle, Satellite, Microscopic, Medical, and Transparent object images...)

MethodCamVidM-BuildingISBIKvasior-SEGSynapseCityscapesTrans10k
SAM58.2767.5972.1575.9440.6157.1583.37
LoRA65.2076.7679.1882.2039.0859.0985.71
ConvLoRA66.9677.3279.8785.2043.4162.4386.47
AdaptFormer74.8080.4680.4688.5361.2875.4989.91
SAMUS48.4249.8778.6488.2820.6648.6187.18
GSAM67.2180.6980.5387.8372.7874.1087.08

And we compared the MACs and segmentation accuracy (ISBI2012).

MethodMACs(G)mIoU
SAM371.9872.15
LoRA371.9879.18
ConvLoRA511.4579.87
AdaptFormer386.4880.46
SAMUS145.8778.64
GSAM(random crop=256×256)270.3380.63
GSAM(random crop=128×128)74.0780.53
GSAM(random crop=64×64)18.5378.53
GSAM(random crop=32×32)7.4271.45

Citation

@article{kato2024generalized,
  title={Generalized SAM: Efficient Fine-Tuning of SAM for Variable Input Image Sizes},
  author={Kato, Sota and Mitsuoka, Hinako and Hotta, Kazuhiro},
  journal={arXiv preprint arXiv:2408.12406},
  year={2024}
}