Home

Awesome

Maintenance PR's Welcome Awesome

Segment Anything for Videos: A Systematic Survey

The first survey for : Segment Anything for Videos: A Systematic Survey. Chunhui Zhang, Yawen Cui, Weilin Lin, Guanjie Huang, Yan Rong, Li Liu, Shiguang Shan. [ArXiv][ChinaXiv][ResearchGate][Project][中文解读]

<p align="justify"> Abstract: The recent wave of foundation models has witnessed tremendous success in computer vision (CV) and beyond, with the segment anything model (SAM) having sparked a passion for exploring task-agnostic visual foundation models. Empowered by its remarkable zero-shot generalization, SAM is currently challenging numerous traditional paradigms in CV, delivering extraordinary performance not only in various image segmentation and multi-modal segmentation (e.g., text-to-mask) tasks, but also in the video domain. Additionally, the latest released SAM 2 is once again sparking research enthusiasm in the realm of promptable visual segmentation for both images and videos. However, existing surveys mainly focus on SAM in various image processing tasks, a comprehensive and in-depth review in the video domain is notably absent. To address this gap, this work conducts a systematic review on SAM for videos in the era of foundation models. As the first to review the progress of SAM for videos, this work focuses on its applications to various tasks by discussing its recent advances, and innovation opportunities of developing foundation models on broad applications. We begin with a brief introduction to the background of SAM and video-related research domains. Subsequently, we present a systematic taxonomy that categorizes existing methods into three key areas: video understanding, video generation, and video editing, analyzing and summarizing their advantages and limitations. Furthermore, comparative results of SAM-based and current state-of-the-art methods on representative benchmarks, as well as insightful analysis are offered. Finally, we discuss the challenges faced by current research and envision several future research directions in the field of SAM for video and beyond. </p>

This project will be continuously updated. We expect to include more state-of-the-arts on SAM for videos.

We welcome authors of related works to submit pull requests and become a contributor to this project.

The first comprehensive SAM survey: A Comprehensive Survey on Segment Anything Model for Vision and Beyond is at [here].

:fire: Highlights

- 2024.11.04: The latest update of this project.
- 2024.07.31: The first survey on SAM for videos was online.
- 2024.07.30: The SAM 2 was released.

Citation

If you find our work useful in your research, please consider citing:

@article{chunhui2024samforvideos,
  title={Segment Anything for Videos: A Systematic Survey},
  author={Chunhui Zhang, Yawen Cui, Weilin Lin, Guanjie Huang, Yan Rong, Li Liu, Shiguang Shan},
  journal={arXiv},
  year={2024}
}

Contents

Video Segmentation

Video Object Segmentation

Video Semantic Segmentation

Video Instance Segmentation

Video Panoptic Segmentation

3D Video Segmentation

Audio-Visual Segmentation

Referring Video Object Segmentation

Universal Segmentation

Video Detection and Recognition

Video Object Detection

Video Shadow Detection

Video Camouflaged Object Detection

Deepfake Detection

Video Anomaly Detection

Video Salient Object Detection

Event Detection

Action Recognition

Video Activity and Scene Classification

Video Object Tracking

General Object Tracking

Open-Vocabulary Tracking

Point Tracking

Instruction Tracking

Interactive Tracking and Localization

Domain Specifc Tracking

Video Editing and Generation

Video Editing

Text Guided Video Editing

Other Modality-guided Video Editing

Domain Spacific Editing

Video Frame Interpolation

3D Video Reconstruction

Video Dataset Annotation Generation

Video Super-Resolution

Text-to-Video Generation

Video Generation with Other Modalities

Others

Video Understanding and Analysis

Video Captioning

Video Dialog

Video Grounding

Optical Flow Estimation

Pose Estimation

Video ReID

Others

Medical Video Processing

Other Video Tasks

Video Compression

Robotics

Video Game

Video Style Transfer Attack

Semantic Communication

Tool Software

License

This project is released under the MIT license. Please see the LICENSE file for more information.