Home

Awesome

<h1 align="center">MPVSS: Mask Propagation for Efficient Video Semantic Segmentation</h1>

[NeurIPS 2023] This is the official repository for our paper: MPVSS: Mask Propagation for Efficient Video Semantic Segmentation by Yuetian Weng, Mingfei Han, Haoyu He, Mingjie Li, Lina Yao, Xiaojun Chang and Bohan Zhuang.

Introduction

We have presented a simple yet effective mask propagation framework, dubbed MPVSS, for efficient VSS. Specifically, we have employed a strong query-based image segmentor to process key frames and generate accurate binary masks and class predictions. Then we have proposed to estimate specific flow maps for each segment-level mask prediction of the key frame. Finally, the mask predictions from key frames were subsequently warped to other non-key frames via the proposed query-based flow maps.

main


Installation

See installation instructions for mask2former.


Data preparation

  1. Download vspw dataset from https://www.vspwdataset.com/
  2. Create link to the dataset
ln -s /path/to/your/dataset datasets/vspw

Train and Evaluation

sh run.sh

Experimental Results

VSPW

BackbonemIoUWIoUVC_8VC_16GFLOPs#ParamsFPS
R5037.559.084.177.238.984.133.93
R10138.859.084.879.645.1103.132.38
Swin-T39.962.085.980.439.7114.032.86
Swin-S40.462.086.080.747.3108.030.61
Swin-B52.668.489.585.961.5147.027.38
Swin-L53.969.189.685.897.3255.423.22

Cityscapes

BackbonemIoUGFLOPs#Params (M)FPS
R5078.4173.284.113.43
R10178.2204.3103.112.55
Swin-T80.7175.9114.012.33
Swin-S81.3213.2108.010.98
Swin-B81.7278.6147.09.54
Swin-L81.6449.5255.47.24

If you find this repository or our paper useful, please consider cite:

@inproceedings{weng2023mask,
  title={Mask Propagation for Efficient Video Semantic Segmentation},
  author={Weng, Yuetian and Han, Mingfei and He, Haoyu and Li, Mingjie and Yao, Lina and Chang, Xiaojun and Zhuang, Bohan},
  booktitle={NeurIPS},
  year={2023}
}

Acknowledgement

The code is largely based on Mask2Former. We thank the authors for their open-sourced code.