Home

Awesome

BoxVIS: Video Instance Segmentation with Box Annotation

Minghan LI and Lei ZHANG

[arXiv]

<div align="center"> <img src="imgs/BoxVIS_overview.jpg" width="80%" height="100%"/> </div><br/>

Updates

Installation

See installation instructions.

Datasets

See Datasets preparation.

Getting Started

We provide a script train_net_boxvis.py, that is made to train all the configs provided in BoxVIS.

Training: download pretrained weights of Mask2Former and save it into the path 'pretrained/*.pth', then run:

sh run.sh

Inference: download trained weights, and save it into the path 'pretrained/*.pth', then run:

sh test.sh

Quantitative performance comparison

<div align="center"> <img src="imgs/sota_yt21_coco.jpg" width="80%" height="100%"/> </div><br/>

<a name="CitingBoxVIS"></a>Citing BoxVIS

If you use BoxVIS in your research or wish to refer to the baseline results published in the Model Zoo, please use the following BibTeX entry.

@misc{li2023boxvis,
      title={BoxVIS: Video Instance Segmentation with Box Annotations}, 
      author={Minghan Li and Lei Zhang},
      year={2023},
      eprint={2303.14618},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

Acknowledgement

Our code is largely based on Detectron2, Mask2Former, MinVIS, and VITA. We are truly grateful for their excellent work.