Awesome
GRAtt-VIS
<p align="left"><img src="architecture.png" width="1000"/></p>This is an official Pytorch implementation of GRAtt-VIS: Gated Residual Attention for Auto Rectifying Video Instance Segmentation. In this repository, we provide PyTorch code for training and testing our proposed GRAtt-VIS model. GRAtt-VIS is an efficient video instance segmentation and tracking model that achieves state-of-the-art results on several benchmarks, such as YTVIS-19/21/22 and OVIS.
Updates
Jun 14, 2023
: Code is now available!
Installation
GRAtt-VIS is built upon VITA. See installation instructions.
Getting Started
We provide a script train_net_grattvis.py
, that is made to train all the configs provided in GRAtt-VIS.
To train a model with "train_net_grattvisvis.py" on VIS, first
setup the corresponding datasets following
Preparing Datasets.
Then run with pretrained weights on target VIS dataset in VITA's Model Zoo:
python3 train_net_genvis.py --num-gpus 4 \
--config-file configs/genvis/ovis/grattvis_R50_bs8.yaml \
MODEL.WEIGHTS weights/vita_r50_ovis.pth \
MODEL.GENVIS.USE_MEM False MODEL.GENVIS.GATED_PROP True \
OUTPUT_DIR your_output_dir
To evaluate a model's performance, use
python3 train_net_genvis.py --num-gpus 1 \
--config-file YOUR_MODEL_PATH/config.yaml \
--eval-only MODEL.WEIGHTS YOUR_MODEL_PATH/model_checkpoint.pth \
MODEL.GENVIS.USE_MEM False MODEL.GENVIS.GATED_PROP True \
OUTPUT_DIR your_output_dir
<a name="ModelZoo"></a>Model Zoo
YouTubeVIS-2019
Backbone | AP | AP50 | AP75 | AR1 | AR10 | Download |
---|---|---|---|---|---|---|
R-50 | 50.4 | 70.7 | 55.2 | 48.4 | 58.7 | model |
Swin-L | 63.1 | 85.6 | 67.2 | 55.5 | 67.8 | model |
YouTubeVIS-2021
Backbone | AP | AP50 | AP75 | AR1 | AR10 | Download |
---|---|---|---|---|---|---|
R-50 | 48.9 | 69.2 | 53.1 | 41.8 | 56.0 | model |
Swin-L | 60.3 | 81.3 | 67.1 | 48.8 | 64.5 | model |
YouTubeVIS-2022
Backbone | AP | AP50 | AP75 | AR1 | AR10 | Download |
---|---|---|---|---|---|---|
R-50 | 40.8 | 60.1 | 45.9 | 35.7 | 46.9 | model |
Swin-L | 52.6 | 74.0 | 57.9 | 45.0 | 57.1 | model |
OVIS
Backbone | AP | AP50 | AP75 | AR1 | AR10 | Download |
---|---|---|---|---|---|---|
R-50 | 36.2 | 60.8 | 36.8 | 16.8 | 40.0 | model |
Swin-L | 45.7 | 69.1 | 47.8 | 19.2 | 49.4 | model |
License
The majority of GRAtt-VIS is licensed under a Apache-2.0 License. However portions of the project are available under separate license terms: Detectron2(Apache-2.0 License), Mask2Former(MIT License), Deformable-DETR(Apache-2.0 License), GENVIS(Apache-2.0 License), and VITA(Apache-2.0 License).
<a name="CitingGRAttVIS"></a>Citing GRAttVIS
If you find GRAtt-VIS useful in your research and wish to refer to the baseline results, please use the following BibTeX entry as a citation.
@article{hannan2023gratt,
title={GRAtt-VIS: Gated Residual Attention for Auto Rectifying Video Instance Segmentation},
author={Hannan, Tanveer and Koner, Rajat and Bernhard, Maximilian and Shit, Suprosanna and Menze, Bjoern and Tresp, Volker and Schubert, Matthias and Seidl, Thomas},
journal={arXiv preprint arXiv:2305.17096},
year={2023}
}
Acknowledgement
We acknowledge the following repositories from where we have inherited code snippets.