Awesome
Activation-to-Saliency version2 (A2S-v2)
An excellent work A2S-v3 is accepted by IJCAI 2024! You are welcome to check the latest code of A2S-v3 for further contributions.
The naming convention is open to the community, e.g., A2S-v4, as long as they 1) are published in top-tier conferences or journals and 2) don't conflict with other works.
Source code of our CVPR 2023 paper: "Texture-guided Saliency Distilling for Unsupervised Salient Object Detection".
This work is an improved method of our previous Activation-to-Saliency (A2S-v1) published in TCSVT 2023.
These two works are based on SOD benchmark.
Resource
You can download the pre-trained MoCo-v2 weight and all trained weights of our method.
RGB SOD results: pseudo labels and saliency maps.
Results on other multimodal SOD datasets can be easily generated using our code.
Training & Testing
Dataset
For convenience, we re-organize the prevalent datasets used in SOD tasks.
Task | Stage 1 network | Stage 2 network | Training sets | Test sets |
---|---|---|---|---|
RGB | a2s | cornet | [cr] DUTS-TR or MSB-TR | [ce] HKU-IS , PASCAL-S , ECSSD , DUTS-TE , DUT-OMRON , MSB-TE |
RGB-D | a2s | midnet | [dr] RGBD-TR or RGBD-TR-2985 | [de] DUT , LFSD , NJUD , NLPR , RGBD135 , SIP , SSD , STERE1000 , STEREO |
RGB-T | a2s | midnet | [tr] VT5000-TR | [te] VT821 , VT1000 and VT5000-TE |
Video | a2s | midnet | [or] VSOD-TR | [oe] SegV2 , FBMS , DAVIS-TE , DAVSOD-TE |
Networks a2s
and cornet
are inherited from our previous A2S-v1 and midnet
is from here.
MSB-TR
and MSB-TE
are the train+val and test splits of the MSRA-B dataset.
RGBD-TR
(2185 samples, default) and RGBD-TR-2985
(2985 samples) are two different training sets for RGB-D SOD task.
VT5000-TR
and VT5000-TE
are the train and test splits of the VT5000 dataset.
VSOD-TR
is the collection of the train splits of the DAVIS and DAVSOD datasets.
Notice
--vals
has two characters that define the datasets used for testing.
First character (task): RGB[c], RGB-D[d], RGB-T[t], and video[o];
Second character (phase): training[r] or test[e] sets.
--trset
defines the training sets of different tasks, the same as the first character of --vals
.
More details please refer to data.py
.
Stage 1
## Training
# Training for RGB SOD task
python3 train.py a2s --gpus=[gpu_num] --trset=c
# Split training for single multimodal task
python3 train.py a2s --gpus=[gpu_num] --trset=[d/o/t]
# Joint training for four multimodal tasks
python3 train.py a2s --gpus=[gpu_num] --trset=cdot
## Testing
# Generating pseudo labels
python3 test.py a2s --gpus=[gpu_num] --weight=[path_to_weight] --vals=[cr/dr/or/tr] --save --crf
# Testing on test sets
python3 test.py a2s --gpus=[gpu_num] --weight=[path_to_weight] --vals=[ce/de/oe/te] [--save]
After the training process in stage 1, we will generate pseudo labels for all training sets and save them to a new pseudo
folder.
Stage 2
## Training
# Training for RGB SOD task
python3 train.py cornet --gpus=[gpu_num] --stage=2 --trset=c --vals=ce
# Training for RGB-D, RGB-T or video SOD tasks
python3 train.py midnet --gpus=[gpu_num] --stage=2 --trset=[d/o/t] --vals=[de/oe/te]
## Testing
python3 test.py [cornet/midnet] --gpus=[gpu_num] --weight=[path_to_weight] --vals=[de/oe/te] [--save]
Reference
Thanks for citing our serial works.
@inproceedings{zhou2023texture,
title={Texture-Guided Saliency Distilling for Unsupervised Salient Object Detection},
author={Zhou, Huajun and Qiao, Bo and Yang, Lingxiao and Lai, Jianhuang and Xie, Xiaohua},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={7257--7267},
year={2023}
}
@ARTICLE{zhou2023a2s1,
title={Activation to Saliency: Forming High-Quality Labels for Unsupervised Salient Object Detection},
author={Zhou, Huajun and Chen, Peijia and Yang, Lingxiao and Xie, Xiaohua and Lai, Jianhuang},
journal={IEEE Transactions on Circuits and Systems for Video Technology},
year={2023},
volume={33},
number={2},
pages={743-755},
doi={10.1109/TCSVT.2022.3203595}}