Home

Awesome

Maintenance PR's Welcome Awesome

A Comprehensive Survey on Segment Anything Model for Vision and Beyond

The First Comprehensive SAM Survey: A Comprehensive Survey on Segment Anything Model for Vision and Beyond. Chunhui Zhang, Li Liu, Yawen Cui, Guanjie Huang, Weilin Lin, Yiqian Yang, Yuehong Hu. [paper] [homepage][中文解读]

<p align="justify"> Abstract: Artificial intelligence (AI) is evolving towards artificial general intelligence, which refers to the ability of an AI system to perform a wide range of tasks and exhibit a level of intelligence similar to that of a human being. This is in contrast to narrow or specialized AI, which is designed to perform specific tasks with a high degree of efficiency. Therefore, it is urgent to design a general class of models, which we term foundation models, trained on broad data that can be adapted to various downstream tasks. The recently proposed segment anything model (SAM) has made significant progress in breaking the boundaries of segmentation, greatly promoting the development of foundation models for computer vision. To fully comprehend SAM, we conduct a survey study. As the first to comprehensively review the progress of segmenting anything task for vision and beyond based on the foundation model of SAM, this work focuses on its applications to various tasks and data types by discussing its historical development, recent progress, and profound impact on broad applications. We first introduce the background and terminology for foundation models including SAM, as well as state-of-the-art methods contemporaneous with SAM that are significant for segmenting anything task. Then, we analyze and summarize the advantages and limitations of SAM across various image processing applications, including software scenes, real-world scenes, and complex scenes. Importantly, many insights are drawn to guide future research to develop more versatile foundation models and improve the architecture of SAM. We also summarize massive other amazing applications of SAM in vision and beyond. Finally, we maintain a continuously updated paper list and an open-source project summary for foundation model SAM at here. </p>

Awesome Segment Anything Models: A curated list of awesome segment anything models in computer vision and beyond. This repository supplements our survey paper. We intend to continuously update it.

If you like our project, please give us a star ⭐ on GitHub for latest update.

We strongly encourage authors of relevant works to make a pull request and add their paper's information [here].

:boom:SAM 2: Segment Anything in Images and Videos was released.

:boom:The first survey on SAM for videos: Segment Anything for Videos: A Systematic Survey was online.


:fire: Highlights

Last Updated

- 2024.07.31: The first survey on SAM for videos was online.
- 2024.07.30: The SAM 2 was released.
- 2023.07.14: "Segment Anything" was accepted by ICCV 2023.
- 2023.05.16: An initial version of recent papers and projects.
- 2023.04.05: The paper of "Segment Anything" was online.

Contents

Citation

If you find our work useful in your research, please consider citing:

@article{chunhui2023samsurvey,
  title={A Comprehensive Survey on Segment Anything Model for Vision and Beyond},
  author={Zhang, Chunhui and Liu, Li and Cui, Yawen and Huang, Guanjie and Lin, Weilin and Yang, Yiqian and Hu, Yuehong},
  journal={arXiv:2305.08196},
  year={2023}
}

@article{chunhui2024samforvideos,
  title={Segment Anything for Videos: A Systematic Survey},
  author={Chunhui Zhang, Yawen Cui, Weilin Lin, Guanjie Huang, Yan Rong, Li Liu, Shiguang Shan},
  journal={arXiv},
  year={2024}
}

Survey

Paper List

Seminal Papers

Follow-up Papers

:boom:UnSegMedGAT: A. Mudit Adityaja, Saurabh J. Shigwan, Nitin Kumar.<br /> "UnSegMedGAT: Unsupervised Medical Image Segmentation using Graph Attention Networks Clustering." ArXiv (2024). [paper] [code] [2024.11]

:boom:SpineFM: Samuel J. Simons, Bartłomiej W. Papież.<br /> "SpineFM: Leveraging Foundation Models for Automatic Spine X-ray Segmentation." ArXiv (2024). [paper] [2024.11]

:boom:MV-Adapter: Lianjun Liu et al.<br /> "MV-Adapter: Enhancing Underwater Instance Segmentation via Adaptive Channel Attention." ArXiv (2024). [paper] [2024.11]

:boom:ZIM: Beomyoung Kim, Chanyong Shin, Joonhyun Jeong, Hyungsik Jung, Se-Yun Lee, Sewhan Chun, Dong-Hyun Hwang, Joonsang Yu.<br /> "ZIM: Zero-Shot Image Matting for Anything." ArXiv (2024). [paper] [code] [2024.11]

:boom:Sourav Modak, Anthony Stein.<br /> "Generative AI-based Pipeline Architecture for Increasing Training Efficiency in Intelligent Weed Control Systems." ArXiv (2024). [paper] [2024.11]

:boom:SFCD-Net: Zhang, Da and Wang, Feiyu and Ning, Lichen and Zhao, Zhiyuan and Gao, Junyu and Li, Xuelong.<br /> "Integrating SAM with Feature Interaction for Remote Sensing Change Detection." TGRS(2024). [paper] [2024.10]

:boom:Swin-Unet-SAM: Lin, Yanfen, Tinghao Fan, and Congfu Fang.<br /> "A Classification and Segmentation Model for Diamond Abrasive Grains Based on Improved Swin-Unet-SAM." Electronics (2024). [paper] [2024.10]

:boom:RSPS-SAM: Liu, Zhuoran, Zizhen Li, Ying Liang, Claudio Persello, Bo Sun, Guangjun He, and Lei Ma.<br /> "RSPS-SAM: A Remote Sensing Image Panoptic Segmentation Method Based on SAM." Remote Sensing(2024). [paper] [2024.10]

:boom:SF-SAM-Adapter: Ting Lei, Jing Chen, Jixiang Chen.<br /> "SF-SAM-Adapter: SAM-based segmentation model integrates prior knowledge for gaze image reflection noise removal." Alexandria Engineering Journal(2024). [paper] [code] [2024.10]

:boom:MWVOS: Zhenghao Zhang and Shengfan Zhang and Zuozhuo Dai and Zilong Dong and Siyu Zhu.<br /> "MWVOS: Mask-Free Weakly Supervised Video Object Segmentation via promptable foundation model." Pattern Recognition(2024). [paper] [2024.10]

:boom:Chuan Yang, Yueqin Zhu, Jiantong Zhang, Xiaoqiang Wei, Haomeng Zhu, and Zhehui Zhu.<br /> "A feature fusion method on landslide identification in remote sensing with Segment Anything Model." Landslides (2024). [paper] [2024.10]

Open Source Projects

No.ProjectTitleProject pageCode baseAffiliationDescription
001SAMSegment AnythingProject pageCodeMetaA foundation model for general segmentation.
002SAM-TrackSegment and Track AnythingColabCodeZhejiang UniversityA project dedicated to tracking and segmenting any objects in videos, either automatically or interactively.
003Grounded-SAMGrounded-Segment-AnythingColabCodeIDEA-ResearchA project by combining Grounding DINO and SAM which aims to detect and segment Anything with text inputs.
004MMDet-SAM--CodeOpenMMLabA new way of instance segmentation by combining SAM with Closed-Set Object Detection, Open-Vocabulary Object Detection, Grounding Object Detection.
005MMRotate-SAMZero-shot Oriented Object Detection with SAM-CodeOpenMMLabA project join SAM and weakly supervised horizontal box detection to achieve rotated box detection.
006MMOCR-SAM--CodeOpenMMLabA solution of Text Detection/Recognition and SAM that segments every text character, with striking text removal and text inpainting demos driven by diffusion models and Gradio.
007MMEditing-SAM--CodeOpenMMLabA project join SAM and image generation to create awesome images and edit any part of them.
008Label-Studio-SAMOpenMMLab PlayGround: Semi-Automated Annotation with Label-Studio and SAM-CodeOpenMMLabA project combining Label-Studio and SAM to achieve semi-automated annotation.
009PaddleSegSegment Anything with PaddleSeg-CodePaddlePaddleA pretrained model parameters of PaddlePaddle format.
010SegGPTSegmenting Everything In ContextHugging FaceCodeBAAI-VisionSAM In Context based on Painter.
011SEEMSegment Everything Everywhere All at OnceHugging FaceCodeMicrosoftA project can Segment Everything Everywhere with Multi-modal prompts all at once.
012CLIP SurgeryCLIP Surgery for Better Explainability with Enhancement in Open Vocabulary TasksProject pageCodeHKUSTA work about SAM based on CLIP's explainability to achieve text to mask without manual points.
013SAMCODCan SAM Segment Anything? When SAM Meets Camouflaged Object Detection-Code-SAM +Camouflaged object detection (COD) task.
014Inpaint AnythingSegment Anything Meets Image InpaintingHugging FaceCodeUSTC and EITSAM combines Inpainting, which is able to remove the object smoothly.
015PerSAMPersonalize Segment Anything Model with One ShotHugging FaceCode-SAM with specific concepts.
016MedSAMSegment Anything in Medical Images-Code-A step-by-step tutorial with a small dataset to help you quickly utilize SAM.
017Segment-Any-AnomalyGroundedSAM Anomaly DetectionColabCodeHUSTGrounding DINO + SAM to segment any anomaly.
018SSASemantic Segment Anything-CodeFudan UniversityA dense category annotation engine.
019Magic Copy--Code-Magic Copy is a Chrome extension that uses SAM to extract a foreground object from an image and copy it to the clipboard.
020Segment Anything with ClipSegment Anything with ClipHugging FaceCode-SAM combined with CLIP.
021MetaSegSegment Anything VideoHugging FaceCode-Packaged version of the SAM.
022SAM in NapariSegment Anything Model (SAM) in NapariProject pageCodeApplied Computer Vision Lab and German Cancer Research CenterExtended SAM's click-based foreground separation to full click-based semantic segmentation and instance segmentation.
023SAM Medical ImagingSAM Medical Imaging-Code-SAM for Medical Imaging.
0243D-Box3D-Box via Segment Anything-Code-SAM is extended to 3D perception by combining it with VoxelNeXt.
025Anything-3D--Code-Anything 3DNovel View, Anything-NeRF, Any 3DFace.
026L2SETLearning to Segment EveryThing-CodeUC Berkeley, FAIRA new partially supervised training paradigm for instance segmentation.
027Edit AnythingEdit Anything by Segment-Anything-Code-Edit anything in images powered by SAM, ControlNet, StableDiffusion, \etc.
028Image Edit AnythingIEA: Image Editing Anything-Code-Using stable diffusion and SAM for image editing.
029SAM for Stable Diffusion WebuiSegment Anything for Stable Diffusion WebUI-Code-This extension aim for connecting AUTOMATIC1111 Stable Diffusion WebUI and Mikubill ControlNet Extension with SAM and GroundingDINO to enhance Stable Diffusion/ControlNet inpainting.
030Earth Observation ToolsSegment Anything EO toolsColabCode-An earth observation tools for SAM.
031Moving Object DetectionTowards Segmenting Anything That Moves-Code-A project about SAM + Moving Object Detection.
032OCR-SAMOptical Character Recognition with Segment AnythingProject pageCode-Combining MMOCR with SAM and Stable Diffusion.
033SALTSegment Anything Labelling Tool-Code-A project uses the SAM Model and adds a barebones interface to label images and saves the masks in the COCO format.
034Prompt Segment AnythingPrompt Segment Anything-Code-An implementation of zero-shot instance segmentation using SAM.
035SAM-RBox--Code-A project uses SAM for generating rotated bounding boxes with MMRotate, which is a comparison method of H2RBox-v2.
036VISAMMOTRv2: Bootstrapping End-to-End Multi-Object Tracking by Pretrained Object Detectors-Code-Combining SAM with MOT, it create the era of "MOTS".
037SegEOSegment Anything EO tools-Code-The tools are developed to ease the processing of spatial data (GeoTIFF and TMS) with SAM using sliding window algorithm for big files.
038Napari Segment AnythingNapari Segment AnythingProject pageCode-SAM native Qt UI.
039Segment-Anything-U-SpecifySegment-Anything-U-Specify-Code-Using CLIP and SAM to segment any instance you specify with text prompt of any instance names.
040SegDrawerSimple static web-based mask drawerColabCode-Simple static web-based mask drawer, supporting semantic segmentation with SAM.
041Track AnythingSegment Anything Meets VideosHugging FaceCodeSUSTechTrack-Anything is a flexible and interactive tool for video object tracking and segmentation.
042Count Anything--Code-A method uses SAM and CLIP to ground and count any object that matches a custom text prompt, without requiring any point or box annotation.
043RAMRelate Anything ModelHugging FaceCodeMMLab, NTU and VisCom Lab, KCL/TongJiRelate Anything Model is capable of taking an image as input and utilizing SAM to identify the corresponding mask within the image.
044Segment Any RGBDSegment Any RGBDProject pageCode-Segment AnyRGBD is a toolbox to segment rendered depth images based on SAM.
045Show AnythingShow AnythingHugging FaceCodeShowlab, NUSSome Applications that are compatible with both SAM and Generation.
046Transfer Any StyleAny-to-Any Style Transfer: Making Picasso and Da Vinci Collaborate-CodeLV-lab, NUSAn interactive demo based on Segment-Anything for style transfer which enables different content regions apply different styles.
047Caption Anything-ColabCodeVIP lab, SUSTechCaption-Anything is a versatile image processing tool that combines the capabilities of SAM, Visual Captioning, and ChatGPT.
048Image2ParagraphTransform Image Into Unique ParagraphProject pageCode-Transform Image into Unique Paragraph with ChatGPT, BLIP2, OFA, GRIT, Segment Anything, ControlNet.
049LIME SAMLocal Interpretable Model-agnostic Explanations Segment AnythingColabCode-LIME-SAM aims to create an Explainable Artificial Intelligence (XAI) framework for image classification using LIME (Local Interpretable Model-agnostic Explanations) as the base algorithm, with the super-pixel method replaced by SAM.
050Paint Anything--Code-An interactive demo based on SAM for stroke-based painting which enables human-like painting.
051SAMedCustomized Segment Anything Model for Medical Image SegmentationColabCodeUSTCSAMed is built upon the large-scale image segmentation model, SAM, to explore the new research paradigm of customizing large-scale models for medical image segmentation.
052Personalize SAMPersonalize Segment Anything with 1 Shot in 10 SecondsHugging FaceCodeMMLab, CUHKA training-free Personalization approach for SAM, termed as PerSAM. Given only a single image with a reference mask, PerSAM can segment specific visual concepts.
053Open-vocabulary-Segment-AnythingOpen-vocabulary-Segment-Anything-Code-Combining OwlViT with Segment Anything - Open-vocabulary Detection and Segmentation (Text-conditioned, and Image-conditioned).
054Labal-Anything-PipelineLabel-Anything-Pipeline-CodeZJUAnnotation anything in visual tasks just all in one-pipeline with GPT-4 and SAM.
055Grounded-Segment-Any-PartsGrounded Segment Anything: From Objects to PartsProject pageCodeHKUExpand Segment Anything Model (SAM) to support text prompt input. The text prompt could be object-level(eg, dog) and part-level(eg, dog head).
056AnyLabelingAnyLabelingYoutube pageCode-Effortless AI-assisted data labeling with AI support from Segment Anything and YOLO.
057SSASemantic-Segment-AnythingProject pageCode-Automated dense category annotation engine that serves as the initial semantic labeling for the Segment Anything dataset (SA-1B).
058RefSAMLabel Data with Segment Anything in RoboflowProject pageCode-Referring Image Segmentation Benchmarking with Segment Anything Model (SAM).
059Roboflow AnnotateLaunch: Label Data with Segment Anything in RoboflowProject pageAPPRoboflowSAM-assisted labeling for training computer vision models.
060ImageBind SAM--CodeIDEA-ResearchThis is an experimental demo aims to combine ImageBind and SAM to generate mask with different modalities.
061X-AnyLabelingX-AnyLabelingWeChatCodeCVHubA new interactive automatic labeling tool based on AnyLabeling.
062Segment Anything + NNCF-WeChatCode-OpenVINO™ NNCF for segment anything encoder quantization acceleration.
063YOLOv8 + SAM-WeChat--Use SAM in YOLOv8.
064SearchAnythingSearchAnythingZhihu blog, TwitterCodeCAS and MSRAA semantic local search engine powered by various AI models.
065SAM Meets Stable Diffusion-WeChatCodePaddlePaddleSegment and generate Anything.
066Language Segment-Anything--Code-SAM with text prompts generates masks for specific objects in images.
067Expedit-SAM--Code-Expediting SAM without Fine-tuning.
068Segment-Anything-FastAccelerating Generative AI with PyTorch: Segment Anything, FastProject pageCodeTeam PyTorchA batched offline inference oriented version of segment-anything.
069YOLOv9+SAMYOLOv9+SAMProject pageCode-Dynamic Detection and Segmentation with YOLOv9+SAM.
070LiteMedSAMLiteMedSAMProject pageCode-A lightweight version of MedSAM for fast training and inference.

Awesome Repositories for SAM

License

This project is released under the MIT license. Please see the LICENSE file for more information.