Awesome
MinVIS: A Minimal Video Instance Segmentation Framework without Video-based Training
De-An Huang, Zhiding Yu, Anima Anandkumar
<div align="center"> <img src="https://ai.stanford.edu/~dahuang/images/minvis.png" width="100%" height="100%"/> </div>Features
- Video instance segmentation by only training an image instance segmentation model.
- Support major video instance segmentation datasets: YouTubeVIS 2019/2021, Occluded VIS (OVIS).
Qualitative Results on Occluded VIS
<img src="https://ai.stanford.edu/~dahuang/images/ovis_sheep.gif" height="200"/> <img src="https://ai.stanford.edu/~dahuang/images/ovis_fish.gif" height="200"/>
Installation
See installation instructions.
Getting Started
See Preparing Datasets for MinVIS.
See Getting Started with MinVIS.
Model Zoo
Trained models are available for download in the MinVIS Model Zoo.
License
The majority of MinVIS is made available under the Nvidia Source Code License-NC. The trained models in the MinVIS Model Zoo are made available under the CC BY-NC-SA 4.0 License.
Portions of the project are available under separate license terms: Mask2Former is licensed under a MIT License. Swin-Transformer-Semantic-Segmentation is licensed under the MIT License, Deformable-DETR is licensed under the Apache-2.0 License.
<a name="CitingMinVIS"></a>Citing MinVIS
@inproceedings{huang2022minvis,
title={MinVIS: A Minimal Video Instance Segmentation Framework without Video-based Training},
author={De-An Huang and Zhiding Yu and Anima Anandkumar},
journal={NeurIPS},
year={2022}
}
Acknowledgement
This repo is largely based on Mask2Former (https://github.com/facebookresearch/Mask2Former).