Awesome
<p align="center"><img src="illustration.jpg" width="700"/></p>Towards Robust Referring Video Object Segmentation with Cyclic Relational Consistency
Xiang Li, Jinglu Wang, Xiaohao Xu, Xiao Li, Bhiksha Raj, Yan Lu
Updates
- (2023-05-30) Code released.
- (2023-07-13) R2VOS is accepted to ICCV 2023!
Install
conda install pytorch==1.8.1 torchvision==0.9.1 torchaudio==0.8.1 -c pytorch
pip install -r requirements.txt
pip install 'git+https://github.com/facebookresearch/fvcore'
pip install -U 'git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI'
cd models/ops
python setup.py build install
cd ../..
Docker
You may try docker to quick start.
Weights
Please download and put the checkpoint.pth in the main folder.
Run demo:
Inference on images in the demo/demo_examples.
python demo.py --with_box_refine --binary --freeze_text_encoder --output_dir=output/demo --resume=checkpoint.pth --backbone resnet50 --ngpu 1 --use_cycle --mix_query --neg_cls --is_eval --use_cls --demo_exp 'a big track on the road' --demo_path 'demo/demo_examples'
Inference:
If you want to evaluate on Ref-YTVOS, you may try inference_ytvos.py or inference_ytvos_segm.py if you encounter OOM for the entire video inference.
python inference_ytvos.py --with_box_refine --binary --freeze_text_encoder --output_dir=output/eval --resume=checkpoint.pth --backbone resnet50 --ngpu 1 --use_cycle --mix_query --neg_cls --is_eval --use_cls --ytvos_path=/data/ref-ytvos
Related works for robust multimodal video segmentation:
R2-Bench: Benchmarking the Robustness of Referring Perception Models under Perturbations , Arxiv 2024
Citation
@inproceedings{li2023robust,
title={Robust referring video object segmentation with cyclic structural consensus},
author={Li, Xiang and Wang, Jinglu and Xu, Xiaohao and Li, Xiao and Raj, Bhiksha and Lu, Yan},
booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
pages={22236--22245},
year={2023}
}