Home

Awesome

Video Mask Transfiner

Video Mask Transfiner for High-Quality Video Instance Segmentation [ECCV 2022]

[Project Page | Dataset Page | Paper]

Video Mask Transfiner for High-Quality Video Instance Segmentation,
Lei Ke, Henghui Ding, Martin Danelljan, Yu-Wing Tai, Chi-Keung Tang, Fisher Yu
ECCV 2022 (arXiv 2207.14012)

<p align="center"> <img src='figures/vmt_banner_img.png' align="center" height="300px"> </p>

HQ-YTVIS: High-Quality Video Instance Segmentation Dataset

Mask annotation comparison between Youtube-VIS and HQ-YTVIS. HQ-YTVIS serves as a new benchmark to facilitate future development (training & evaluation) of VIS methods aiming at higher mask quality. <img src="figures/dataset_compare_s.png" width="830"/>

<!-- <img src="figures/data1_new.gif" width="830"/> -->

https://user-images.githubusercontent.com/17427852/181796696-bfe9a9dd-2d39-42a2-b218-283c210e5ffd.mp4

Mask annotations in Youtube-VIS (Left Video) vs. Mask annotations in HQ-YTVIS (Right Video). Please visit our Dataset Page for detailed descriptions of using HQ-YTVIS benchmark.

Dataset Download: HQ-YTVIS Annotation Link
Dataset Usage: replace our annotation json to original YTVIS annotation files.

HQ-YTVIS Evaluation API

Please refer to our Installation Guidance and Tube-Mask AP & Tube-Boundary AP Usage Example.

python eval_hqvis.py --save-path prediction_results.json

VMT Code (under construction)

Install

Please refer to INSTALL.md for installation instructions and dataset preparation.

Usages

Please refer to USAGE.md for dataset preparation and detailed running (including testing, visualization, etc.) instructions.

https://user-images.githubusercontent.com/17427852/181796768-3e79ee74-2465-4af8-ba89-b5c837098e00.mp4

Model zoo on HQ-YTVIS model

Train on HQ-YTVIS train set and COCO, evaluate on HQ-YTVIS test set.

AP<sup>B</sup>: Tube-Boundary AP (proposed in Eq.1 of the paper)

AP<sup>M</sup>: Tube-Mask AP (proposed in YTVIS paper)

ModelAP<sup>B</sup>AP<sup>B</sup><sub>75</sub>AR<sup>B</sup><sub>1</sub>AP<sup>M</sup>AR<sup>M</sup><sub>75</sub>download
VMT_r5030.724.231.550.554.5weight
VMT_r10133.029.333.351.655.8weight
VMT_swin_L44.843.443.064.870.1weight

Citation

@inproceedings{vmt,
    title = {Video Mask Transfiner for High-Quality Video Instance Segmentation},
    author = {Ke, Lei and Ding, Henghui and Danelljan, Martin and Tai, Yu-Wing and Tang, Chi-Keung and Yu, Fisher},
    booktitle = {European Conference on Computer Vision (ECCV)},
    year = {2022}
}

@inproceedings{transfiner,
    title={Mask Transfiner for High-Quality Instance Segmentation},
    author={Ke, Lei and Danelljan, Martin and Li, Xia and Tai, Yu-Wing and Tang, Chi-Keung and Yu, Fisher},
    booktitle = {CVPR},
    year = {2022}
} 

Acknowledgement

We thank Mask Transfiner and SeqFormer for their open source codes.