Home

Awesome

DVSOD-Baseline

This repository provides the source code of DVSOD baseline.

Installation

The code requires python>=3.8, as well as pytorch>=1.11 and torchvision>=0.12. Please follow the instructions here to install both PyTorch and TorchVision dependencies. Installing both PyTorch and TorchVision with CUDA support is strongly recommended.

  1. Clone this repo.

    $ git clone https://github.com/DVSOD/DVSOD-Baseline.git
    $ cd DVSOD-Baseline-main
    
  2. Install dependencies.

    $ conda env create -f dvsod.yaml
    $ conda activate dvsod
    

Getting Started

First download the DViSal dataset. Then the model can be used in just a few adaptions to start training:

  1. Set your DViSal dataset path and ckpt save path in train.py
  2. Perform training, with python train.py

Meanwhile, the saliency can be generated by loading the model checkpoint, with:

  1. Set your DViSal dataset path and ckpt save path in test.py
  2. Specify the ckpt name and testset name in test.py
  3. Perform inference, with python test.py

Instructions for vital parameters in train/test.py:

- set '--is_ResNet'      as **bool**         # whether use ResNet or not
- set '--ckpt_load'      as **bool**         # whether load checkpoint or not
- set '--snapshot'       as **int**          # e.g. 100, which means loading the 100th checkpoint
- set '--baseline_mode'  as **bool**         # whether apply baseline mode or not
- set '--sample_rate''   as **int**          # e.g. 3, whcih means sample rate
- set '--stm_queue_size' as **int**          # e.g. 3, whcih means the number of memory frames
- set '--batchsize'      as **int**          # e.g. 2, whcih means batch size
- set '--trainsize'      as **int**          # e.g. 320, whcih means training data size
- set '--save_interval'  as **int**          # e.g. 2, whcih means saving ckpt per 2 epochs
- set '--epoch'          as **int**          # e.g. 200, whcih means epoch number during training
- set '--lr'             as **float**        # e.g. 1e-4, whcih means learning rate

Citation

@InProceedings{li2023dvsod,
title={DVSOD: RGB-D Video Salient Object Detection},
author={Li, Jingjing and Ji, Wei and Wang, Size and Li, Wenbo and Cheng, Li},
booktitle={Advances in Neural Information Processing Systems},
year={2023},
month={December}
}

References

We sincerely thank CPD, CRM, and STM for their outstanding project contributions!

@inproceedings{wu2019cascaded,
  title={Cascaded partial decoder for fast and accurate salient object detection},
  author={Wu, Zhe and Su, Li and Huang, Qingming},
  booktitle={CVPR},
  pages={3907--3916},
  year={2019}
}
@inproceedings{ji2021calibrated,
  title={Calibrated RGB-D salient object detection},
  author={Ji, Wei and Li, Jingjing and Yu, Shuang and Zhang, Miao and Piao, Yongri and Yao, Shunyu and Bi, Qi and Ma, Kai and Zheng, Yefeng and Lu, Huchuan and others},
  booktitle={CVPR},
  pages={9471--9481},
  year={2021}
}
@inproceedings{oh2019video,
  title={Video object segmentation using space-time memory networks},
  author={Oh, Seoung Wug and Lee, Joon-Young and Xu, Ning and Kim, Seon Joo},
  booktitle={ICCV},
  pages={9226--9235},
  year={2019}
}