Home

Awesome

VNext:

<p align="center"><img src="assets/VNext.png" width="300"/></p>

To date, VNext contains the official implementation of the following algorithms:

InstMove: Instance Motion for Object-centric Video Segmentation (CVPR 2023)

IDOL: In Defense of Online Models for Video Instance Segmentation (ECCV2022 Oral)

SeqFormer: Sequential Transformer for Video Instance Segmentation (ECCV2022 Oral)

NEWS!!:

Getting started

  1. For Installation and data preparation, please refer to to INSTALL.md for more details.
  2. For InstMove training, evaluation, plugin, and model zoo, please refer to InstMove.md
  3. For IDOL training, evaluation, and model zoo, please refer to IDOL.md
  4. For SeqFormer training, evaluation and model zoo, please refer to SeqFormer.md

IDOL

PWC PWC PWC

In Defense of Online Models for Video Instance Segmentation

Junfeng Wu, Qihao Liu, Yi Jiang, Song Bai, Alan Yuille, Xiang Bai

Introduction

<p align="center"><img src="assets/IDOL/arch.png" width="1000"/></p>

Visualization results on OVIS valid set

<img src="assets/IDOL/vid_2.gif" width="400"/><img src="assets/IDOL/vid_61.gif" width="400"/> <img src="assets/IDOL/vid_96.gif" width="400"/><img src="assets/IDOL/vid_116.gif" width="400"/>

Quantitative results

YouTube-VIS 2019

<p align="center"><img src="assets/IDOL/ytvis2019_results.png" width="1000"/></p>

OVIS 2021

<p align="center"><img src="assets/IDOL/ovis_results.png" width="1000"/></p>

SeqFormer

PWC

<p align="center"><img src="assets/SeqFormer/SeqFormer_sota.png" width="500"/></p>

SeqFormer: Sequential Transformer for Video Instance Segmentation

Junfeng Wu, Yi Jiang, Song Bai, Wenqing Zhang, Xiang Bai

Introduction

<p align="center"><img src="assets/SeqFormer/SeqFormer_arch.png" width="1000"/></p>

Visualization results on YouTube-VIS 2019 valid set

<img src="assets/SeqFormer/vid_15.gif" width="400"/><img src="assets/SeqFormer/vid_78.gif" width="400"/> <img src="assets/SeqFormer/vid_133.gif" width="400"/><img src="assets/SeqFormer/vid_210.gif" width="400"/>

Quantitative results

YouTube-VIS 2019

<p align="center"><img src="assets/SeqFormer/ytvis2019_results.png" width="1000"/></p>

YouTube-VIS 2021

<p align="center"><img src="assets/SeqFormer/ytvis2021_results.png" width="1000"/></p>

Citation

@inproceedings{seqformer,
  title={SeqFormer: Sequential Transformer for Video Instance Segmentation},
  author={Wu, Junfeng and Jiang, Yi and Bai, Song and Zhang, Wenqing and Bai, Xiang},
  booktitle={ECCV},
  year={2022},
}

@inproceedings{IDOL,
  title={In Defense of Online Models for Video Instance Segmentation},
  author={Wu, Junfeng and Liu, Qihao and Jiang, Yi and Bai, Song and Yuille, Alan and Bai, Xiang},
  booktitle={ECCV},
  year={2022},
}

Acknowledgement

This repo is based on detectron2, Deformable DETR, VisTR, and IFC Thanks for their wonderful works.