Awesome
Mask Selection and Propagation for Unsupervised Video Object Segmentation
Introduction
Prerequisites
pytorch >= 1.4
python 3.6
Inferencing
To run the code you will be needing masks from Mask R-CNN and DAVIS dataset. The pre-trained weights of STM can be downloaded from here and selector_net from here. The output masks of Mask R-CNN should be numbered sequentially starting from 0 representing background. Select top 10 masks from the masks whose confidence score is more than 0.1.
- Place the masks in
path_to_data_dir/Annotations/480p
and the DAVIS frames inpath_to_data_dir/JPEGImages/480p
. - Place the downloaded weights of STM and selecto_net in
checkpoint
folder. There are 3 parts of the method criterion 1, criteria 2, stage 3. - Evaluate using criterion 1 by executing
run.sh
(change the datapath). - Change the python script name in
run.sh
toeval_DAVIS_crit2.py
to evaluate using criterion 2. - Finally, run
eval_stage_3.py
using command below by giving relavent paths of masks generated using criterion 1 and criterion 2.
python eval_stage_3.py -m1 results/STM_DAVIS_2019challenge -m2 results/STM_DAVIS_2019challenge2/ -r data/DAVIS/JPEGImages/480p/ -f set_file.txt
Note
Selector net has been trained using Mask R-CNN outputs and the output masks of Mask R-CNN can highly vary depending on various implementations and even different hyperparameter in the same implementation. Hence, to get the true results of the method it is desirable to train the selector_net using the object detection and segmentation network that you are using. Training selector_net should not take more than 1hr. Training details are given below.
To run the code on other datasets, change the structure of datafiles to that of DAVIS dataset then the same scripts can be used.
Training
The training dataset needs to generated using DAVIS TrainVal before training selector net. Follow the steps to generate dataset
-
Generate masks using Mask R-CNN for each frames in the dataset.
-
Generate masks using vanilla STM for each frames in the dataset.
a) Pass the ground truth frame as first frame
b) Use ground truth frames instead of criterion1/2 to compare Mask R-CNN ouput and STM output and then propogate frames.
c) The scripts
eval_DAVIS_crit1.py
can be used by replacing the selection criterion as explained b). -
Use the hungarian algorithm to assign the generated masks to the corresponding ground truth in the dataset(similar to this code)
Run the following script
python train.py --train_dataset_path #path_to_davis_data --maskrcnn_dataset_path #path_to_maskrcnn_masks --stm_dataset_path #path_to_STM_masks
Precomputed Results
The pre-computed for DAVIS 2019 dev unsupervised dataset coloured results after stage 3 can be found here
Citations
Please cite the following papers if the work was helpful.
@inproceedings{garg2021mask,
title={Mask Selection and Propagation for Unsupervised Video Object Segmentation},
author={Garg, Shubhika and Goel, Vidit},
booktitle={Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision},
pages={1680--1690},
year={2021}
}
@article{DAVIS2020-Unsupervised-1st,
author = {S. Garg, V. Goel, S. Kumar},
title = {Unsupervised Video Object Segmentation using Online Mask Selection and Space-time Memory Networks},
journal = {The 2020 DAVIS Challenge on Video Object Segmentation - CVPR Workshops},
year = {2020}
}