Home

Awesome

1st Place Solution for MeViS Track in CVPR 2024 PVUW Workshop: Motion Expression guided Video Segmentation

Mingqi Gao<sup>1,4,+</sup>, Jingnan Luo<sup>2,+</sup>, Jinyu Yang<sup>1,*</sup>, Jungong Han<sup>3,4</sup>, Feng Zheng<sup>1,2,*</sup>

<sup>1</sup> Tapall.ai   <sup>2</sup> Southern University of Science and Technology   <sup>3</sup> University of Sheffield   <sup>4</sup> University of Warwick

<sup>+</sup> Equal Contributions, <sup>*</sup> Corresponding Authors

Report

Demo

:round_pushpin: Installation

We test the code in the following environments, other versions may also be compatible: Python=3.9, PyTorch=1.10.1, CUDA=11.3

pip install -r requirements.txt
pip install 'git+https://github.com/facebookresearch/fvcore' 
pip install -U 'git+https://github.com/cocodataset/cocoapi.git#subdirectory=PythonAPI'
cd models/ops
python setup.py build install
cd ../..

:round_pushpin: Training

  1. Download MUTR's checkpoint from HERE (Swin-L, joint-training on Ref-COCO series and Ref-YouTube-VOS).
  2. Run following commands to fine-tune MUTR on MeViS:
python -m torch.distributed.launch --nproc_per_node 1 --master_port 10010 --use_env train.py --freeze_text_encoder --with_box_refine --binary --dataset_file mevis --epochs 2 --lr_drop 1 --resume [MUTR checkpoint] --output_dir [output path] --mevis_path [MeViS path] --backbone swin_l_p4w7

:round_pushpin: Inference

Our checkpoint is available on Google Drive.

python inference_mevis.py --with_box_refine --binary --freeze_text_encoder --output_dir [output path] --resume [checkpoint path] --ngpu 1 --batch_size 1 --backbone swin_l_p4w7 --mevis_path [MeViS path] --split valid --sub_video_len 30 --no_sampling (optional, no sampling mode)

:book: Citation

If you find our solution useful for your research, please consider citing with this BibTeX:

@misc{gao20241st,
      title={1st Place Solution for MeViS Track in CVPR 2024 PVUW Workshop: Motion Expression guided Video Segmentation}, 
      author={Mingqi Gao and Jingnan Luo and Jinyu Yang and Jungong Han and Feng Zheng},
      year={2024},
      eprint={2406.07043},
      archivePrefix={arXiv},
      primaryClass={cs.CV}
}

:raised_hands: Acknowledgement

The solution is based on MUTR and MeViS. Thanks for the authors for their efforts.