Awesome
Video Mask Transfiner
Video Mask Transfiner for High-Quality Video Instance Segmentation [ECCV 2022]
[Project Page | Dataset Page | Paper]
<p align="center"> <img src='figures/vmt_banner_img.png' align="center" height="300px"> </p>Video Mask Transfiner for High-Quality Video Instance Segmentation,
Lei Ke, Henghui Ding, Martin Danelljan, Yu-Wing Tai, Chi-Keung Tang, Fisher Yu
ECCV 2022 (arXiv 2207.14012)
HQ-YTVIS: High-Quality Video Instance Segmentation Dataset
Mask annotation comparison between Youtube-VIS and HQ-YTVIS. HQ-YTVIS serves as a new benchmark to facilitate future development (training & evaluation) of VIS methods aiming at higher mask quality. <img src="figures/dataset_compare_s.png" width="830"/>
<!-- <img src="figures/data1_new.gif" width="830"/> -->Mask annotations in Youtube-VIS (Left Video) vs. Mask annotations in HQ-YTVIS (Right Video). Please visit our Dataset Page for detailed descriptions of using HQ-YTVIS benchmark.
Dataset Download: HQ-YTVIS Annotation Link
Dataset Usage: replace our annotation json to original YTVIS annotation files.
HQ-YTVIS Evaluation API
Please refer to our Installation Guidance and Tube-Mask AP & Tube-Boundary AP Usage Example.
python eval_hqvis.py --save-path prediction_results.json
VMT Code (under construction)
Install
Please refer to INSTALL.md for installation instructions and dataset preparation.
Usages
Please refer to USAGE.md for dataset preparation and detailed running (including testing, visualization, etc.) instructions.
Model zoo on HQ-YTVIS model
Train on HQ-YTVIS train set and COCO, evaluate on HQ-YTVIS test set.
AP<sup>B</sup>: Tube-Boundary AP (proposed in Eq.1 of the paper)
AP<sup>M</sup>: Tube-Mask AP (proposed in YTVIS paper)
Model | AP<sup>B</sup> | AP<sup>B</sup><sub>75</sub> | AR<sup>B</sup><sub>1</sub> | AP<sup>M</sup> | AR<sup>M</sup><sub>75</sub> | download |
---|---|---|---|---|---|---|
VMT_r50 | 30.7 | 24.2 | 31.5 | 50.5 | 54.5 | weight |
VMT_r101 | 33.0 | 29.3 | 33.3 | 51.6 | 55.8 | weight |
VMT_swin_L | 44.8 | 43.4 | 43.0 | 64.8 | 70.1 | weight |
Citation
@inproceedings{vmt,
title = {Video Mask Transfiner for High-Quality Video Instance Segmentation},
author = {Ke, Lei and Ding, Henghui and Danelljan, Martin and Tai, Yu-Wing and Tang, Chi-Keung and Yu, Fisher},
booktitle = {European Conference on Computer Vision (ECCV)},
year = {2022}
}
@inproceedings{transfiner,
title={Mask Transfiner for High-Quality Instance Segmentation},
author={Ke, Lei and Danelljan, Martin and Li, Xia and Tai, Yu-Wing and Tang, Chi-Keung and Yu, Fisher},
booktitle = {CVPR},
year = {2022}
}
Acknowledgement
We thank Mask Transfiner and SeqFormer for their open source codes.