Awesome
SLT-Net
This repository contains the code for our CVPR 2022 paper Implicit Motion Handling for Video Camouflaged Object Detection
[CVPR 2022] [arXiv] [Project Page]
SLT-Net: we propose a new video camouflaged object detection (VCOD) framework that can use both short-term dynamics and long-term temporal consistency to detect camouflaged objects from video frames.
<!-- ![alt text](./imgs/overall.png) -->1. Features
Summary. This repository contains the source code, prediction results, and evaluation toolbox in eval
folder.
Demo_videos. In Videos
folder, we demonstrate the video results of our SLT-Net, and two top-performing baselines (including SINet, RCRNet) on MoCA-Mask test dataset.
Results. The results of all compared methods and the whole MoCA-Mask datset could be found here.
2. Proposed Framework
<p align="left"> <img src="./imgs/overall.png" width='523' height='200' /> <br /> <em> Figure 1: The overall pipeline of the SLT-Net. The SLT-Net consists of a short-term detection module and a long-term refinement module. The short-term detection module takes a pair of consecutive frames and predicts the camouflaged object mask for the reference frame. The long-term refinement module takes T predictions from the short-term detection module along with their corresponding referenced frames to generate the final predictions. </em> </p>The training and testing experiments are conducted using PyTorch with a single NVIDIA V100 GPU of 32 GB Memory.
Note that our model also supports low memory GPU, which means you should lower the batch size.
3. Preparation
Requirements.
- Python 3.9.*
- CUDA 11.1
- PyTorch
- TorchVision
Install. Create a virtual environment and activate it.
conda create -n SLTnet python=3.8
conda activate SLTnet
The code has been tested with PyTorch 1.9 and Cuda 11.1.
conda install pytorch torchvision torchaudio cudatoolkit=11.1 -c pytorch -c nvidia
conda install -c conda-forge timm
Install MMCV + MMSegmentation
Follow the instructions here. MMCV and MMSegmentation are required for training the transformer encoder. A quick installation example:
pip install mmcv-full -f https://download.openmmlab.com/mmcv/dist/cu111/torch1.9.0/index.html
pip install mmsegmentation
For the seq-to-seq model of long-term architecture, the core is built on CUDA OP with torchlib. Please could find more details in Github. A quick installation example:
cd ./lib/ref_video/PNS
python setup.py build develop
Dataset. To evaluate/train our SLT-Net network, you will need to download the required datasets. Noting that, If you want to use our Pseudo labels, please download via [MoCA-Mask-Pseudo].
Change the first column path in file create_link.sh
with your actual dataset location. Then run create_link.sh
that will create symbolic links to wherever the datasets were downloaded in the dataset
folder.
├── datasets
├── MoCA-Mask
├── CAD2016
├── COD10K
Notting that for CAD2016 dataset, the original ground-truth maps were labelled as 1/2 index for each pixel. You need to transfer it as 0/255. We also provide the transformed new gt here at your ease.
3. Results
Prediction.
You can evaluate a trained model using prediction.sh
for each dataset, which would help you generate *.png images corresponding to different datasets.
sh test_video.sh
sh test_video_long_term.sh
Evaluation.
Please run the file main_CAD.m
or main_MoCA.m
in eval
folder to evaluate your model. You could also simply download the images via this Link to reach the results reported in our paper. Or download our pre-trained model via this link: snapshot. [If you download it before 7 Sep 2022, please replace it with the new version. The Net_epoch_cod10k.pth in previous snapshpt is wrong with Resnet pretrained weights.]
Acknowledgements. Please find more information about the original MoCA dataset [1] Link.
[1] Hala Lamdouar and Charig Yang and Weidi Xie and Andrew Zisserman Betrayed by Motion: Camouflaged Object Discovery via Motion Segmentation Asian Conference on Computer Vision, 2020
4. Citing
If you find this code useful, please consider to cite our work.
@inproceedings{cheng2022implicit,
title={Implicit Motion Handling for Video Camouflaged Object Detection},
author={Cheng, Xuelian and Xiong, Huan and Fan, Deng-Ping and Zhong, Yiran and Harandi, Mehrtash and Drummond, Tom and Ge, Zongyuan},
booktitle={CVPR},
year={2022}
}