

Training Script for Reuse-VOS

This code implementation of CVPR 2021 paper : Learning Dynamic Network Using a Reuse Gate Function in Semi-supervised Video Object Segmentation.

Hard case (Ours, FRTM)

<img src="./sample/Gate_dog.gif" alt="sample ours hard" width="426" height="240"> (Ours)

<img src="./sample/FRTM_dog.gif" alt="sample FRTM hard" width="426" height="240"> (FRTM)

Easy case (Ours, FRTM)

<img src="./sample/Gate_cow.gif" alt="sample ours easy" width="426" height="240">(Ours)

<img src="./sample/FRTM_cow.gif" alt="sample FRTM easy" width="426" height="240">(FRTM)


python package

GPU support



To test the DAVIS validation split, download and unzip the 2017 480p trainval images and annotations here.

|-- Annotations/
|-- ImageSets/
|-- JPEGImages/


To test our validation split and the YouTubeVOS challenge 'valid' split, download YouTubeVOS 2018 and place it in this directory structure:

|-- train/
|-- train_all_frames/
|-- valid/
`-- valid_all_frames/



modelBackboneTraining setJ & F 17J & F 16link
G-FRTM (t=1)Resnet18Youtube-VOS + DAVIS71.780.9Google Drive
G-FRTM (t=0.7)Resnet18Youtube-VOS + DAVIS69.980.5same pth
G-FRTM (t=1)Resnet101Youtube-VOS + DAVIS76.484.3Google Drive
G-FRTM (t=0.7)Resnet101Youtube-VOS + DAVIS74.382.3same pth


modelBackboneTraining setGJ-SJ-UsF-SF-Uslink
G-FRTM (t=1)Resnet18Youtube-VOS63.868.355.270.661.0Google Drive
G-FRTM (t=0.8)Resnet18Youtube-VOS63.467.655.869.360.9same pth
G-FRTM (t=0.7)Resnet18Youtube-VOS62.767. pth

We initialize orignal-FRTM layers from official FRTM repository weight for Youtube-VOS benchmark. S = Seen, Us = Unseen

Target model cache

Here is the cache file we used for ResNet18 file



Open train.py and adjust the paths dict to your dataset locations, checkpoint and tensorboard output directories and the place to cache target model weights.

To train a network, run following command.

python train.py --name <session-name> --ftext resnet18 --dset all --dev cuda:0

--name is the name of save_dir name of current train --ftext is the name of the feature extractor, either resnet18 or resnet101. --dset is one of dv2017, ytvos2018 or all ("all" really means "both"). --dev is the name of the device to train on. --m1 is the margin1 for training reuse gate, and we use 1.0 for DAVIS benchmark and 0.5 for Youtube-VOS benchmark. --m2 is the margin2 for training reuse gate, and we use 0.

Replace "session-name" with whatever you like. Subdirectories with this name will be created under your checkpoint and tensorboard paths.


Open eval.py and adjust the paths dict to your dataset locations, checkpoint and tensorboard output directories and the place to cache target model weights.

To train a network, run following command.

python evaluate.py --ftext resnet18 --dset dv2017val --dev cuda:0

--ftext is the name of the feature extractor, either resnet18 or resnet101. --dset is one of dv2016val, dv2017val, yt2018jjval, yt2018val or yt2018valAll --dev is the name of the device to eval on. --TH Threshold for tau default= 0.7

The inference results will be saved at ${ROOT}/${result} . It is better to check multiple pth file for good accuracy.


This codebase borrows the code and structure from official FRTM repository. We are grateful to Facebook Inc. with valuable discussions.


The codebase is built based on following works

      title={Learning Dynamic Network Using a Reuse Gate Function in Semi-supervised Video Object Segmentation}, 
      author={Hyojin Park and Jayeon Yoo and Seohyeong Jeong and Ganesh Venkatesh and Nojun Kwak},