Home

Awesome

A Generalized Framework for Video Instance Segmentation (CVPR 2023)

Miran Heo, Sukjun Hwang, Jeongseok Hyun, Hanjung Kim, Seoung Wug Oh, Joon-Young Lee, Seon Joo Kim

[arXiv] [BibTeX]

<div align="center"> <img src="https://user-images.githubusercontent.com/24949098/212600182-90721a1e-aa4c-452c-86ed-ab1149a16b8f.gif" width="30%"/> <img src="https://user-images.githubusercontent.com/24949098/212599620-082b9604-49f1-4f21-bf8e-01885cd38e82.gif" width="30%"/> <img src="https://user-images.githubusercontent.com/24949098/213493785-27312f33-dbae-4d44-8036-69e597366ab9.gif" width="60%"/> </div><br/>

Updates

Installation

GenVIS is built upon VITA. See installation instructions.

Getting Started

We provide a script train_net_genvis.py, that is made to train all the configs provided in GenVIS.

To train a model with "train_net_genvis.py" on VIS, first setup the corresponding datasets following Preparing Datasets.

Then run with pretrained weights on target VIS dataset in VITA's Model Zoo:

python train_net_genvis.py --num-gpus 4 \
  --config-file configs/genvis/ovis/genvis_R50_bs8_online.yaml \
  MODEL.WEIGHTS vita_r50_ovis.pth

To evaluate a model's performance, use

python train_net_genvis.py --num-gpus 4 \
  --config-file configs/genvis/ovis/genvis_R50_bs8_online.yaml \
  --eval-only MODEL.WEIGHTS /path/to/checkpoint_file

<a name="ModelZoo"></a>Model Zoo

Additional weights will be updated soon!

YouTubeVIS-2019

BackboneMethodAPAP50AP75AR1AR10Download
R-50online50.071.554.649.559.7model
R-50semi-online51.372.057.849.560.0model
Swin-Lonline64.084.968.356.169.4model
Swin-Lsemi-online63.885.768.556.368.4model

YouTubeVIS-2021

BackboneMethodAPAP50AP75AR1AR10Download
R-50online47.167.551.541.654.7model
R-50semi-online46.367.050.240.653.2model
Swin-Lonline59.680.965.848.765.0model
Swin-Lsemi-online60.180.966.549.164.7model

OVIS

BackboneMethodAPAP50AP75AR1AR10Download
R-50online35.860.836.216.339.6model
R-50semi-online34.559.435.016.638.3model
Swin-Lonline45.269.148.419.148.6model
Swin-Lsemi-online45.469.247.818.949.0model

License

The majority of GenVIS is licensed under a Apache-2.0 License. However portions of the project are available under separate license terms: Detectron2(Apache-2.0 License), IFC(Apache-2.0 License), Mask2Former(MIT License), Deformable-DETR(Apache-2.0 License), and VITA(Apache-2.0 License).

<a name="CitingGenVIS"></a>Citing GenVIS

If you use GenVIS in your research or wish to refer to the baseline results published in the Model Zoo, please use the following BibTeX entry.

@inproceedings{GenVIS,
  title={A Generalized Framework for Video Instance Segmentation},
  author={Heo, Miran and Hwang, Sukjun and Hyun, Jeongseok and Kim, Hanjung and Oh, Seoung Wug and Lee, Joon-Young and Kim, Seon Joo},
  booktitle={CVPR},
  year={2023}
}

@inproceedings{VITA,
  title={VITA: Video Instance Segmentation via Object Token Association},
  author={Heo, Miran and Hwang, Sukjun and Oh, Seoung Wug and Lee, Joon-Young and Kim, Seon Joo},
  booktitle={Advances in Neural Information Processing Systems},
  year={2022}
}

Acknowledgement

Our code is largely based on Detectron2, IFC, Mask2Former, Deformable DETR, and VITA. We are truly grateful for their excellent work.