Home

Awesome

SAM-RSP: A New Few-Shot Segmentation Method Based on Segment Anything Model and Rough Segmentation Prompts

Abstract: Few-shot segmentation (FSS) aims to segment novel classes with a few labeled images. The backbones used in existing methods are pre-trained through classification tasks on the ImageNet dataset. Although these backbones can effectively perceive the semantic categories of images, they cannot accurately perceive the regional boundaries within one image, which limits the model performance. Recently, Segment Anything Model (SAM) has achieved precise image segmentation based on point or box prompts, thanks to its excellent perception of region boundaries within one image. However, it cannot effectively provide semantic information of images. This paper proposes a new few-shot segmentation method that can effectively perceive both semantic categories and regional boundaries. This method first utilizes the SAM encoder to perceive regions and obtain the query embedding. Then the support and query images are input into a backbone pre-trained on ImageNet to perceive semantics and generate a rough segmentation prompt (RSP). This query embedding is combined with the prompt to generate a pixel-level query prototype, which can better match the query embedding. Finally, the query embedding, prompt, and prototype are combined and input into the designed multi-layer prompt transformer decoder, which is more efficient and lightweight, and can provide a more accurate segmentation result. In addition, other methods can be easily combined with our framework to improve their performance. Plenty of experiments on PASCAL-5<sup>i</sup> and COCO-20<sup>i</sup> under 1-shot and 5-shot settings prove the effectiveness of our method. Our method also achieves new state-of-the-art.

<p align="middle"> <img src="figure/main.png"> </p>

Dependencies

Datasets

Models

Scripts

Performance

Performance comparison with the state-of-the-art approachs in terms of average mIoU across all folds.

  1. PASCAL-5<sup>i</sup>
    BackboneMethod1-shot5-shot
    VGG16MIANet67.1071.99
    SAM-RSP(ours)69.29 <sub>(+2.19)</sub>73.86 <sub>(+1.87)</sub>
    ResNet50HDMNet69.4071.80
    SAM-RSP(ours)70.76 <sub>(+1.36)</sub>74.15 <sub>(+2.35)</sub>
  2. COCO-20<sup>i</sup>
    BackboneMethod1-shot5-shot
    VGG16HDMNet45.9052.40
    SAM-RSP(ours)48.79 <sub>(+2.89)</sub>54.15 <sub>(+1.75)</sub>
    ResNet50MIANet47.6651.65
    SAM-RSP(ours)49.84 <sub>(+2.18)</sub>55.38 <sub>(+3.73)</sub>

References

This repo is mainly built based on SAM, and BAM. Thanks for their great work!

This paper has been accepted by the Image and Vision Computing journal.