Home

Awesome

SLT-Net

This repository contains the code for our CVPR 2022 paper Implicit Motion Handling for Video Camouflaged Object Detection [CVPR 2022] [arXiv] [Project Page]

SLT-Net: we propose a new video camouflaged object detection (VCOD) framework that can use both short-term dynamics and long-term temporal consistency to detect camouflaged objects from video frames.

<!-- ![alt text](./imgs/overall.png) -->

1. Features

Summary. This repository contains the source code, prediction results, and evaluation toolbox in eval folder.

Demo_videos. In Videos folder, we demonstrate the video results of our SLT-Net, and two top-performing baselines (including SINet, RCRNet) on MoCA-Mask test dataset.

Results. The results of all compared methods and the whole MoCA-Mask datset could be found here.

2. Proposed Framework

<p align="left"> <img src="./imgs/overall.png" width='523' height='200' /> <br /> <em> Figure 1: The overall pipeline of the SLT-Net. The SLT-Net consists of a short-term detection module and a long-term refinement module. The short-term detection module takes a pair of consecutive frames and predicts the camouflaged object mask for the reference frame. The long-term refinement module takes T predictions from the short-term detection module along with their corresponding referenced frames to generate the final predictions. </em> </p>

The training and testing experiments are conducted using PyTorch with a single NVIDIA V100 GPU of 32 GB Memory.

Note that our model also supports low memory GPU, which means you should lower the batch size.

3. Preparation

Requirements.

  1. Python 3.9.*
  2. CUDA 11.1
  3. PyTorch
  4. TorchVision

Install. Create a virtual environment and activate it.

conda create -n SLTnet python=3.8
conda activate SLTnet

The code has been tested with PyTorch 1.9 and Cuda 11.1.

conda install pytorch torchvision torchaudio cudatoolkit=11.1 -c pytorch -c nvidia
conda install -c conda-forge timm

Install MMCV + MMSegmentation

Follow the instructions here. MMCV and MMSegmentation are required for training the transformer encoder. A quick installation example:

pip install mmcv-full -f https://download.openmmlab.com/mmcv/dist/cu111/torch1.9.0/index.html
pip install mmsegmentation

For the seq-to-seq model of long-term architecture, the core is built on CUDA OP with torchlib. Please could find more details in Github. A quick installation example:

cd ./lib/ref_video/PNS
python setup.py build develop

Dataset. To evaluate/train our SLT-Net network, you will need to download the required datasets. Noting that, If you want to use our Pseudo labels, please download via [MoCA-Mask-Pseudo].

Change the first column path in file create_link.sh with your actual dataset location. Then run create_link.sh that will create symbolic links to wherever the datasets were downloaded in the dataset folder.

├── datasets
    ├── MoCA-Mask
    ├── CAD2016
    ├── COD10K

Notting that for CAD2016 dataset, the original ground-truth maps were labelled as 1/2 index for each pixel. You need to transfer it as 0/255. We also provide the transformed new gt here at your ease.

3. Results

Prediction. You can evaluate a trained model using prediction.sh for each dataset, which would help you generate *.png images corresponding to different datasets.

sh test_video.sh
sh test_video_long_term.sh

Evaluation. Please run the file main_CAD.m or main_MoCA.m in eval folder to evaluate your model. You could also simply download the images via this Link to reach the results reported in our paper. Or download our pre-trained model via this link: snapshot. [If you download it before 7 Sep 2022, please replace it with the new version. The Net_epoch_cod10k.pth in previous snapshpt is wrong with Resnet pretrained weights.]

Acknowledgements. Please find more information about the original MoCA dataset [1] Link.

[1] Hala Lamdouar and Charig Yang and Weidi Xie and Andrew Zisserman Betrayed by Motion: Camouflaged Object Discovery via Motion Segmentation Asian Conference on Computer Vision, 2020

4. Citing

If you find this code useful, please consider to cite our work.

@inproceedings{cheng2022implicit,
  title={Implicit Motion Handling for Video Camouflaged Object Detection},
  author={Cheng, Xuelian and Xiong, Huan and Fan, Deng-Ping and Zhong, Yiran and Harandi, Mehrtash and Drummond, Tom and Ge, Zongyuan},
  booktitle={CVPR},
  year={2022}
}