Awesome

[ECCV2024] Learning Camouflaged Object Detection from Noisy Pseudo Label (Poster)

This is the open-source repository for our paper Learning Camouflaged Object Detection from Noisy Pseudo Label, accepted at ECCV 2024!

Our Paper Can Be Seen at <font color=Blue>Paper</font>

Framework Architecture

Proposed Models

Performance

Comparison

Training Process

Task Definition: Weakly Semi-Supervised Camouflaged Object Detection (WSSCOD)

We introduce a novel training protocol named Weakly Semi-Supervised Camouflaged Object Detection (WSSCOD), utilizing boxes as prompts to generate high-quality pseudo labels. WSSCOD leverages box annotations, complemented by a minimal amount of pixel-level annotations, to generate high-accuracy pseudo labels.

Dataset Division:
- $\mathcal{D}_m = {\mathcal{X}_m, \mathcal{F}_m, \mathcal{B}m}{m=1}^M$: Pixel-level annotations $\mathcal{F}_m$, box annotations $\mathcal{B}_m$, and training images $\mathcal{X}_m$.
- $\mathcal{D}_n = {\mathcal{X}_n, \mathcal{B}n}{n=1}^N$: Box annotations and images, where $M+N$ represents the number of training sets.
Training ANet:
- Train ANet using dataset $\mathcal{D}_m$.
- Use $\mathcal{B}_m$ as prompts and $\mathcal{F}_m$ for supervision.
Generating Pseudo Labels:
- Use the trained ANet and dataset $\mathcal{D}_n$ to predict pseudo labels $\mathcal{W}_n$.
Constructing the Weakly Semi-Supervised Dataset:
- Combine ${\mathcal{X}_m, \mathcal{F}m}{m=1}^M$ and ${\mathcal{X}_n, \mathcal{W}n}{n=1}^N$ to form $\mathcal{D}_t$.
Training PNet:
- Train PNet using the dataset $\mathcal{D}_t$.
- Evaluate performance with different $M$ and $N$ ratios:
  - PNet$_{F1}$: $M=1%$, $N=99%$
  - PNet$_{F5}$: $M=5%$, $N=95%$
  - PNet$_{F10}$: $M=10%$, $N=90%$
  - PNet$_{F20}$: $M=20%$, $N=80%$

Details: ANet and PNet Training

Aspect	ANet (Auxiliary Network)	PNet (Primary Network)
Stage	First	Second
Objective	Generate high-accuracy pseudo labels	Main camouflaged object detection
Data Input	Subset $\mathcal{D}_m$ with pixel and box annotations	Weakly semi-supervised dataset $\mathcal{D}_t$
Training Dataset	$\mathcal{D}_m = {\mathcal{X}_m, \mathcal{F}_m, \mathcal{B}m}{m=1}^M$	$\mathcal{D}_t = {\mathcal{X}_m, \mathcal{F}m}{m=1}^M \cup {\mathcal{X}_n, \mathcal{W}n}{n=1}^N$
Annotations	Pixel-level $\mathcal{F}_m$ and box $\mathcal{B}_m$	Pseudo labels $\mathcal{W}_n$ and pixel-level $\mathcal{F}_m$
Supervision	Pixel-level $\mathcal{F}_m$ for pseudo label generation	Pseudo labels $\mathcal{W}_n$ and pixel-level $\mathcal{F}_m$
Input Prompts	Box annotations $\mathcal{B}_m$ for camouflaged objects	Images $\mathcal{X}_m$ and $\mathcal{X}_n$
Performance Evaluation	-	Different settings: PNet$_{F1}$, PNet$_{F5}$, PNet$_{F10}$, PNet$_{F20}$
Training Goal	Generate high-quality pseudo labels $\mathcal{W}_n$	Improve detection accuracy with various $M$ and $N$ ratios

1. Download the Training and Test Sets

We have made the training and test sets available for download via the following links:

Google Drive
BaiDu Drive (Passwd: ECCV)

Once downloaded, place data.zip in the code/data directory and unzip it.

2. Train ANet

python code/TrainANet/TrainDDP.py --gpu_id 0 --ration 1 
# ration represents the proportion of pixel-level labels
# we find that one card training is better than four or eight cards

3. Generate Pseudo Labels

python code/TrainANet/Test.py --ration 1 
# ration represents the proportion of pixel-level labels

4. Train PNet

python code/TrainANet/TrainDDP.py --gpu_id 0 --ration 1 --q_epoch 20 --batchsize_fully 6 --batchsize_weakly 24 
# ration represents the proportion of pixel-level labels
# q_epoch means we change the q to 1 at this epoch 
# batchsize_fully means the number of fully annotated samples in a batch
# batchsize_weakly means the number of weakly annotated samples in a batch

5. Testing Process

python code/TrainPNet/Test.py --ration 1 
# ration represents the proportion of pixel-level labels

Pretrained Weights and COD Results

For ANet

We release the weight and prediction maps of $N=99%$, $N=95%$, $N=90%$ and $N=20%$ at Biadu Link.

For PNet

Model	Pretrained Weight	Prediction Description
PNet$_{F1}$	Google Link	$M=1%$, $N=99%$
PNet$_{F5}$	Google Link	$M=5%$, $N=95%$
PNet$_{F10}$	Google Link	$M=10%$, $N=90%$
PNet$_{F20}$	Google Link	$M=20%$, $N=80%$

References

@inproceedings{zhang2025learning,
  title={Learning Camouflaged Object Detection from Noisy Pseudo Label},
  author={Zhang, Jin and Zhang, Ruiheng and Shi, Yanjiao and Cao, Zhe and Liu, Nian and Khan, Fahad Shahbaz},
  booktitle={European Conference on Computer Vision},
  pages={158--174},
  year={2025},
  organization={Springer}
}