Home

Awesome

Accelerating Video Object Segmentation with Compressed Video

This is an offical PyTorch implementation of

Accelerating Video Object Segmentation with Compressed Video. CVPR 2022.
[arXiv] [Project Page]
Kai Xu, Angela Yao
Computer Vision and Machine Learning group, NUS.

Installation

Prepare Conda Environment: (We test the code for python=3.10 and pytorch=1.11. Similar versions will also work.)

conda create -n CoVOS python
conda activate CoVOS
conda install pytorch torchvision cudatoolkit=11.3 -c pytorch
pip install tqdm tabulate opencv-python easydict ninja scikit-image scikit-video
# Install CUDA motion vector warping function.
python setup.py build_ext --inplace install

Prepare HEVC feature decoder: (Here are two options.)

git clone https://github.com/kai422/openHEVC_feature_decoder.git
cd openHEVC_feature_decoder
git checkout Interface_MV_Residual
# If yasm package is not installed, use the following command. 
sudo apt-get install -y yasm
mkdir build
cd build
cmake -DCMAKE_BUILD_TYPE=RELEASE ..
make -j9
make DESTDIR={install_path} install

Prepare Data:

Download Data:

DAVIS: Download 480p and Full-Resolution data and put them into the same folder. After unzipping, the structure of the directory should be:

{data_path}/
├──DAVIS/
│   ├──Annotations
│   │   └── ... 
│   ├──ImageSets
│   │   └── ...  
│   └──JPEGImages
│       ├──480p
│       └──Full-Resolution

YouTube-VOS: Download YouTubeVOS 2018. After unzipping, the structure of the directory should be:

{data_path}/
├──YouTube-VOS/
│   ├──train/
│   ├──train_all_frames/
│   ├──valid/
│   └──valid_all_frames/

Some video frame indexes do not start from 0, so we need to rearrange the snippets.

bash scripts/snippets_rearrange.sh

Update data_path in path_config.py.

Encode Videos:

Encode raw image sequences into HEVC videos by

# to reproduce, use FFmpeg 3.4.8-0ubuntu0.2 (the default version for ubuntu 18.04)
bash scripts/data/encode_video_davis.sh
bash scripts/encode_video_ytvos.sh

Encoded videos will be stored at {data_path}/DAVIS/HEVCVideos and {data_path}/YouTube-VOS/HEVCVideos.

Alternatively, HEVC-encoded video could be downloaded from Google Drive.

Models

Download pretrained models for base network:

CoVOS pretrained models are already included in the uploaded github repository: weights/covos_light_encoder.pth and weights/covos_propagator.pth.

Testing

You can download pre-computed results from Google Drive.

Commands:

DAVIS 16 ValJFJ&FFPS
STM88.789.989.314.9
STM+CoVOS87.087.387.231.5
# DAVIS16, base model: stm
CUDA_VISIBLE_DEVICES=0 scripts/exps/covos_dv2016_stm.sh
RESULT_PATH=results/covos_stm/dv2016 DSET=dv2016val python evaluate_from_folder.py
DAVIS 16 ValJFJ&FFPS
FRTM-VOS--83.521.9
FRTM-VOS+CoVOS82.382.282.328.6
# DAVIS16, base model: frtm
CUDA_VISIBLE_DEVICES=0 scripts/exps/covos_dv2016_frtm.sh
RESULT_PATH=results/covos_frtm/dv2016 DSET=dv2016val python evaluate_from_folder.py
DAVIS 16 ValJFJ&FFPS
MiVOS89.792.491.016.9
MiVOS+CoVOS89.089.889.436.8
# DAVIS16, base model: mivos
CUDA_VISIBLE_DEVICES=0 scripts/exps/covos_dv2016_mivos.sh
RESULT_PATH=results/covos_mivos/dv2016 DSET=dv2016val python evaluate_from_folder.py
DAVIS 16 ValJFJ&FFPS
STCN90.493.091.726.9
STCN+CoVOS88.589.689.142.7
# DAVIS16, base model: stcn
CUDA_VISIBLE_DEVICES=0 scripts/exps/covos_dv2016_stcn.sh
RESULT_PATH=results/covos_stcn/dv2016 DSET=dv2016val python evaluate_from_folder.py 

DAVIS 17 ValJFJ&FFPS
STM79.284.381.810.6
STM+CoVOS78.382.780.523.8
# DAVIS17, base model: stm
CUDA_VISIBLE_DEVICES=0 scripts/exps/covos_dv2017_stm.sh
RESULT_PATH=results/covos_stm/dv2017 DSET=dv2017val python evaluate_from_folder.py
DAVIS 17 ValJFJ&FFPS
FRTM-VOS--76.714.1
FRTM-VOS+CoVOS69.775.272.520.6
# DAVIS17, base model: frtm
CUDA_VISIBLE_DEVICES=0 scripts/exps/covos_dv2017_frtm.sh
RESULT_PATH=results/covos_frtm/dv2017 DSET=dv2017val python evaluate_from_folder.py
DAVIS 17 ValJFJ&FFPS
MiVOS81.887.484.511.2
MiVOS+CoVOS79.784.682.225.5
# DAVIS17, base model: mivos
CUDA_VISIBLE_DEVICES=0 scripts/exps/covos_dv2017_mivos.sh
RESULT_PATH=results/covos_mivos/dv2017 DSET=dv2017val python evaluate_from_folder.py
DAVIS 17 ValJFJ&FFPS
STCN82.088.685.320.2
STCN+CoVOS79.785.182.433.7
# DAVIS17, base model: stcn
scripts/exps/covos_dv2017_stcn.sh
RESULT_PATH=results/covos_stcn/dv2017 DSET=dv2017val python evaluate_from_folder.py

YT-VOS 18 ValGJ_sF_sJ_uF_uFPS
FRTM-VOS72.172.376.265.974.17.7
FRTM-VOS+CoVOS65.668.071.058.265.425.3
#Youtube-VOS 2018, base model: frtm
CUDA_VISIBLE_DEVICES=0 scripts/exps/covos_yt2018_frtm.sh
YT-VOS 18 ValGJ_sF_sJ_uF_uFPS
MiVOS82.681.185.677.786.213
MiVOS+CoVOS79.378.983.073.581.745.9
# Youtube-VOS 2018, base model: mivos
CUDA_VISIBLE_DEVICES=0 scripts/exps/covos_yt2018_mivos.sh
YT-VOS 18 ValGJ_sF_sJ_uF_uFPS
STCN84.383.287.979.087.316.8
STCN+CoVOS79.079.483.672.680.457.9
# Youtube-VOS 2018, base model: stcn
CUDA_VISIBLE_DEVICES=0 scripts/exps/covos_yt2018_stcn.sh

License and Acknowledgement

This project is released under the GPL-3.0 License. We refer to codes from MiVOS, FRTM-VOS, and DAVIS.

Citation

@inproceedings{xu2022covos,
  title={Accelerating Video Object Segmentation with Compressed Video},
  author={Kai Xu and Angela Yao},
  booktitle={CVPR},
  year={2022}
}