Awesome
Unified Embedding Alignment for Open-Vocabulary Video Instance Segmentation (ECCV 2024)
Hao Fang, Peng Wu, Yawei Li, Xinxin Zhang, Xiankai Lu
<div align="center"> <img src="OVFormer.png" width="100%" height="100%"/> </div><br/>Installation
See installation instructions.
Data Preparation
See Preparing Datasets for OVFormer.
Getting Started
We firstly train the OVFormer model on LVIS dataset:
python train_net.py --num-gpus 4 \
--config-file configs/lvis/ovformer_R50_bs8.yaml
To evaluate model's zero-shot generalization performance on VIS Datasets, use
python train_net_video.py \
--config-file configs/youtubevis_2019/ovformer_R50_bs8.yaml \
--eval-only MODEL.WEIGHTS models/ovformer_r50_lvis.pth
YTVIS19/21 requires splitting the results.json into base and novel categories by Tool,
OVIS directly packages and uploads to the specified server, BURST needs to run mAP.py
.
You are expected to get results like this:
Model | Backbone | YTVIS19 | YTVIS21 | OVIS | BURST | weights |
---|---|---|---|---|---|---|
OVFormer | R-50 | 34.8 | 29.8 | 15.1 | 6.8 | model |
OVFormer | Swin-B | 44.3 | 37.6 | 21.3 | 7.6 | model |
Then, we video-based train the OVFormer model on LV-VIS dataset:
python train_net_lvvis.py --num-gpus 4 \
--config-file configs/lvvis/video_ovformer_R50_bs8.yaml
To evaluate a model's performance on LV-VIS dataset, use
python train_net_lvvis.py \
--config-file configs/lvvis/video_ovformer_R50_bs8.yaml \
--eval-only MODEL.WEIGHTS models/ovformer_r50_lvvis.pth
Run mAP.py
, you are expected to get results like this:
Model | Backbone | LVVIS val | LVVIS test | weights |
---|---|---|---|---|
OVFormer | R-50 | 21.9 | 15.2 | model |
OVFormer | Swin-B | 24.7 | 19.5 | model |
<a name="CitingOVFormer"></a>Citing OVFormer
@inproceedings{fang2024unified,
title={Unified Embedding Alignment for Open-Vocabulary Video Instance Segmentation},
author={Hao, Fang and Peng, Wu and Yawei, Li and Xinxin, Zhang and Xiankai, Lu},
booktitle={ECCV},
year={2024},
}
Acknowledgement
This repo is based on detectron2, Mask2Former, and LVVIS. Thanks for their great work!