Home

Awesome

Attention-guided Temporally Coherent Video Object Matting

This is the Github project for our paper Attention-guided Temporally Coherent Video Object Matting (arXiv:2105.11427) accepted by ACMMM 2021. We provide our code, the supplementary material, trained model and VideoMatting108 dataset here. For the trimap generation module, please see TCVOM-TGM.

The code, the trained model and the dataset are for academic and non-commercial use only.

The paper and the supplementary material can be found here.

Table of Contents

VideoMatting108 Dataset

VideoMatting108 is a large video matting dataset that contains 108 video clips with their corresponding groundtruth alpha matte, all in 1080p resolution, 80 clips for training and 28 clips for validation.

You can download the dataset here (BaiduYun mirror, code: v108). The total size of the dataset is 192GB and we've split the archive into 1GB chunks.

The contents of the dataset are the following:

After decompressing, the dataset folder should have the structure of the following (please rename flow_png_val to flow_png):

|---dataset
  |-FG_done
  |-BG_done
  |-flow_png
  |-frame_corr.json
  |-train_videos.txt
  |-train_videos_subset.txt
  |-val_videos.txt
  |-val_videos_subset.txt

Models

Currently our method supports four different image matting methods as base.

The trained model can be downloaded here. We provide four different weights for every base method.

Results

This is the quantitative result on VideoMatting108 validation dataset with medium width trimap. The metric is averaged on all 28 validation video clips.

We use CUDA 10.2 during the inference. Using CUDA 11.1 might result in slightly lower metric. All metrics are calculated with calc_metric.py.

MethodLossSSDAdtSSDMESSDdtMSE*(10^3)mSAD
GCA+F (Baseline)L_im55.8231.642.158.2040.85
GCA+TAML_im+L_tc+L_af50.4127.281.487.0737.65
DIM+F (Baseline)L_im61.8534.552.829.9944.38
DIM+TAML_im+L_tc+L_af58.9429.892.069.0243.28
Index+F (Baseline)L_im58.5333.032.339.3743.53
Index+TAML_im+L_tc+L_af57.9129.361.818.7843.17
FBA+F (Baseline)L_im57.4729.602.199.2840.57
FBA+TAML_im+L_tc+L_af51.5725.501.597.6137.24

Usage

Requirements

Python=3.8
Pytorch=1.6.0
numpy
opencv-python
imgaug
tqdm
yacs

Inference

pred_single.py and pred_vmn.py automatically use all CUDA devices available. pred_test.py uses cuda:0 device as default.

Training

PY_CMD="python -m torch.distributed.launch --nproc_per_node=NUMBER_OF_CUDA_DEVICES"

Contact

If you have any questions, please feel free to contact yunkezhang@zju.edu.cn.