Home

Awesome

BMVOS

This is the official PyTorch implementation of our paper:

Pixel-Level Bijective Matching for Video Object Segmentation, WACV 2022
Suhwan Cho, Heansung Lee, Minjung Kim, Sungjun Jang, Sangyoun Lee
Link: [WACV] [arXiv]

<img src="https://github.com/user-attachments/assets/812c4399-5afb-4b12-9dbf-5628fdbe02f3" width=750>

You can also find other related papers at awesome-video-object-segmentation.

Abstract

In conventional semi-supervised VOS methods, the query frame pixels select the best-matching pixels in the reference frame and transfer the information from those pixels without any consideration of reference frame options. As there is no limitation to the number of reference frame pixels being referenced, background distractors in the query frame will get high foreground scores and can disrupt the prediction. To mitigate this issue, we introduce a bijective matching mechanism to find the best matches from the query frame to the reference frame and also vice versa. In addition, to take advantage of the property of a video that an object usually occupies similar positions in consecutive frames, we propose a mask embedding module.

Preparation

1. Download DAVIS and YouTube-VOS from the official websites.

2. Download our custom split for the YouTube-VOS training set.

3. Replace dataset paths in "run.py" file with your dataset paths.

Training

Please follow the instructions in TBD.

Testing

1. Open the "run.py" file.

2. Choose a pre-trained model.

3. Start BMVOS testing!

python run.py

Attachments

pre-trained model (davis)
pre-trained model (ytvos)
pre-computed results

Note

Code and models are only available for non-commercial research purposes.
If you have any questions, please feel free to contact me :)

E-mail: suhwanx@gmail.com