Awesome

Activation-to-Saliency version2 (A2S-v2)

An excellent work A2S-v3 is accepted by IJCAI 2024! You are welcome to check the latest code of A2S-v3 for further contributions.

The naming convention is open to the community, e.g., A2S-v4, as long as they 1) are published in top-tier conferences or journals and 2) don't conflict with other works.

Source code of our CVPR 2023 paper: "Texture-guided Saliency Distilling for Unsupervised Salient Object Detection".
This work is an improved method of our previous Activation-to-Saliency (A2S-v1) published in TCSVT 2023.
These two works are based on SOD benchmark.

Resource

You can download the pre-trained MoCo-v2 weight and all trained weights of our method.
RGB SOD results: pseudo labels and saliency maps.
Results on other multimodal SOD datasets can be easily generated using our code.

Training & Testing

Dataset

For convenience, we re-organize the prevalent datasets used in SOD tasks.

Task	Stage 1 network	Stage 2 network	Training sets	Test sets
RGB	`a2s`	`cornet`	[cr] `DUTS-TR` or `MSB-TR`	[ce] `HKU-IS`, `PASCAL-S`, `ECSSD`, `DUTS-TE`, `DUT-OMRON`, `MSB-TE`
RGB-D	`a2s`	`midnet`	[dr] `RGBD-TR` or `RGBD-TR-2985`	[de] `DUT`, `LFSD`, `NJUD`, `NLPR`, `RGBD135`, `SIP`, `SSD`, `STERE1000`, `STEREO`
RGB-T	`a2s`	`midnet`	[tr] `VT5000-TR`	[te] `VT821`, `VT1000` and `VT5000-TE`
Video	`a2s`	`midnet`	[or] `VSOD-TR`	[oe] `SegV2`, `FBMS`, `DAVIS-TE`, `DAVSOD-TE`

Networks a2s and cornet are inherited from our previous A2S-v1 and midnet is from here.
MSB-TR and MSB-TE are the train+val and test splits of the MSRA-B dataset.
RGBD-TR (2185 samples, default) and RGBD-TR-2985 (2985 samples) are two different training sets for RGB-D SOD task.
VT5000-TR and VT5000-TE are the train and test splits of the VT5000 dataset.
VSOD-TR is the collection of the train splits of the DAVIS and DAVSOD datasets.

Notice

--vals has two characters that define the datasets used for testing.
First character (task): RGB[c], RGB-D[d], RGB-T[t], and video[o];
Second character (phase): training[r] or test[e] sets.
--trset defines the training sets of different tasks, the same as the first character of --vals.
More details please refer to data.py.

Stage 1

## Training
# Training for RGB SOD task
python3 train.py a2s --gpus=[gpu_num] --trset=c

# Split training for single multimodal task
python3 train.py a2s --gpus=[gpu_num] --trset=[d/o/t]

# Joint training for four multimodal tasks
python3 train.py a2s --gpus=[gpu_num] --trset=cdot

## Testing
# Generating pseudo labels
python3 test.py a2s --gpus=[gpu_num] --weight=[path_to_weight] --vals=[cr/dr/or/tr] --save --crf

# Testing on test sets
python3 test.py a2s --gpus=[gpu_num] --weight=[path_to_weight] --vals=[ce/de/oe/te] [--save]

After the training process in stage 1, we will generate pseudo labels for all training sets and save them to a new pseudo folder.

Stage 2

## Training
# Training for RGB SOD task
python3 train.py cornet --gpus=[gpu_num] --stage=2 --trset=c --vals=ce

# Training for RGB-D, RGB-T or video SOD tasks
python3 train.py midnet --gpus=[gpu_num] --stage=2 --trset=[d/o/t] --vals=[de/oe/te]

## Testing
python3 test.py [cornet/midnet] --gpus=[gpu_num] --weight=[path_to_weight] --vals=[de/oe/te] [--save]

Reference

Thanks for citing our serial works.

@inproceedings{zhou2023texture,
  title={Texture-Guided Saliency Distilling for Unsupervised Salient Object Detection},
  author={Zhou, Huajun and Qiao, Bo and Yang, Lingxiao and Lai, Jianhuang and Xie, Xiaohua},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
  pages={7257--7267},
  year={2023}
}

@ARTICLE{zhou2023a2s1,
  title={Activation to Saliency: Forming High-Quality Labels for Unsupervised Salient Object Detection}, 
  author={Zhou, Huajun and Chen, Peijia and Yang, Lingxiao and Xie, Xiaohua and Lai, Jianhuang},
  journal={IEEE Transactions on Circuits and Systems for Video Technology}, 
  year={2023},
  volume={33},
  number={2},
  pages={743-755},
  doi={10.1109/TCSVT.2022.3203595}}

Results

Result