Home

Awesome

GRAtt-VIS

DOI PWC

<p align="left"><img src="architecture.png" width="1000"/></p>

This is an official Pytorch implementation of GRAtt-VIS: Gated Residual Attention for Auto Rectifying Video Instance Segmentation. In this repository, we provide PyTorch code for training and testing our proposed GRAtt-VIS model. GRAtt-VIS is an efficient video instance segmentation and tracking model that achieves state-of-the-art results on several benchmarks, such as YTVIS-19/21/22 and OVIS.

Updates

Installation

GRAtt-VIS is built upon VITA. See installation instructions.

Getting Started

We provide a script train_net_grattvis.py, that is made to train all the configs provided in GRAtt-VIS. To train a model with "train_net_grattvisvis.py" on VIS, first setup the corresponding datasets following Preparing Datasets. Then run with pretrained weights on target VIS dataset in VITA's Model Zoo:

python3 train_net_genvis.py --num-gpus 4 \
--config-file configs/genvis/ovis/grattvis_R50_bs8.yaml \
MODEL.WEIGHTS weights/vita_r50_ovis.pth \
MODEL.GENVIS.USE_MEM False MODEL.GENVIS.GATED_PROP True \
OUTPUT_DIR your_output_dir

To evaluate a model's performance, use

python3 train_net_genvis.py --num-gpus 1 \
--config-file YOUR_MODEL_PATH/config.yaml \
--eval-only MODEL.WEIGHTS YOUR_MODEL_PATH/model_checkpoint.pth \
MODEL.GENVIS.USE_MEM False MODEL.GENVIS.GATED_PROP True \
OUTPUT_DIR your_output_dir

<a name="ModelZoo"></a>Model Zoo

YouTubeVIS-2019

BackboneAPAP50AP75AR1AR10Download
R-5050.470.755.248.458.7model
Swin-L63.185.667.255.567.8model

YouTubeVIS-2021

BackboneAPAP50AP75AR1AR10Download
R-5048.969.253.141.856.0model
Swin-L60.381.367.148.864.5model

YouTubeVIS-2022

BackboneAPAP50AP75AR1AR10Download
R-5040.860.145.935.746.9model
Swin-L52.674.057.945.057.1model

OVIS

BackboneAPAP50AP75AR1AR10Download
R-5036.260.836.816.840.0model
Swin-L45.769.147.819.249.4model

License

The majority of GRAtt-VIS is licensed under a Apache-2.0 License. However portions of the project are available under separate license terms: Detectron2(Apache-2.0 License), Mask2Former(MIT License), Deformable-DETR(Apache-2.0 License), GENVIS(Apache-2.0 License), and VITA(Apache-2.0 License).

<a name="CitingGRAttVIS"></a>Citing GRAttVIS

If you find GRAtt-VIS useful in your research and wish to refer to the baseline results, please use the following BibTeX entry as a citation.

@article{hannan2023gratt,
  title={GRAtt-VIS: Gated Residual Attention for Auto Rectifying Video Instance Segmentation},
  author={Hannan, Tanveer and Koner, Rajat and Bernhard, Maximilian and Shit, Suprosanna and Menze, Bjoern and Tresp, Volker and Schubert, Matthias and Seidl, Thomas},
  journal={arXiv preprint arXiv:2305.17096},
  year={2023}
}

Acknowledgement

We acknowledge the following repositories from where we have inherited code snippets.

  1. Detectron2
  2. Mask2Former
  3. VITA
  4. GENVIS