Awesome
Visual Saliency Transformer (VST)
source code for our ICCV 2021 paper “Visual Saliency Transformer” by Nian Liu, Ni Zhang, Kaiyuan Wan, Junwei Han, and Ling Shao.
created by Ni Zhang, email: nnizhang.1995@gmail.com
Requirement
- Pytorch 1.6.0
- Torchvison 0.7.0
RGB VST for RGB Salient Object Detection
Data Preparation
Training Set
We use the training set of DUTS to train our VST for RGB SOD. Besides, we follow Egnet to generate contour maps of DUTS trainset for training. You can directly download the generated contour maps DUTS-TR-Contour
from [baidu pan fetch code: ow76 | Google drive] and put it into RGB_VST/Data
folder.
Testing Set
We use the testing set of DUTS, ECSSD, HKU-IS, PASCAL-S, DUT-O, and SOD to test our VST. After Downloading, put them into RGB_VST/Data
folder.
Your RGB_VST/Data
folder should look like this:
-- Data
|-- DUTS
| |-- DUTS-TR
| |-- | DUTS-TR-Image
| |-- | DUTS-TR-Mask
| |-- | DUTS-TR-Contour
| |-- DUTS-TE
| |-- | DUTS-TE-Image
| |-- | DUTS-TE-Mask
|-- ECSSD
| |--images
| |--GT
...
Training, Testing, and Evaluation
cd RGB_VST
- Download the pretrained T2T-ViT_t-14 model [baidu pan fetch code: 2u34 | Google drive] and put it into
pretrained_model/
folder. - Run
python train_test_eval.py --Training True --Testing True --Evaluation True
for training, testing, and evaluation. The predictions will be inpreds/
folder and the evaluation results will be inresult.txt
file.
Testing on Our Pretrained RGB VST Model
cd RGB_VST
- Download our pretrained
RGB_VST.pth
[baidu pan fetch code: pe54 | Google drive] and then put it incheckpoint/
folder. - Run
python train_test_eval.py --Testing True --Evaluation True
for testing and evaluation. The predictions will be inpreds/
folder and the evaluation results will be inresult.txt
file.
Our saliency maps can be downloaded from [baidu pan fetch code: 92t0 | Google drive].
SOTA Saliency Maps for Comparison
The saliency maps of the state-of-the-art methods in our paper can be downloaded from [baidu pan fetch code: de4k | Google drive].
RGB-D VST for RGB-D Salient Object Detection
Data Preparation
Training Set
We use 1,485 images from NJUD, 700 images from NLPR, and 800 images from DUTLF-Depth to train our VST for RGB-D SOD. Besides, we follow Egnet to generate corresponding contour maps for training. You can directly download the whole training set from here [baidu pan fetch code: 7vsw | Google drive] and put it into RGBD_VST/Data
folder.
Testing Set
NJUD [baidu pan fetch code: 7mrn | Google drive]
NLPR [baidu pan fetch code: tqqm | Google drive]
DUTLF-Depth [baidu pan fetch code: 9jac | Google drive]
STERE [baidu pan fetch code: 93hl | Google drive]
LFSD [baidu pan fetch code: l2g4 | Google drive]
RGBD135 [baidu pan fetch code: apzb | Google drive]
SSD [baidu pan fetch code: j3v0 | Google drive]
SIP [baidu pan fetch code: q0j5 | Google drive]
ReDWeb-S
After Downloading, put them into RGBD_VST/Data
folder.
Your RGBD_VST/Data
folder should look like this:
-- Data
|-- NJUD
| |-- trainset
| |-- | RGB
| |-- | depth
| |-- | GT
| |-- | contour
| |-- testset
| |-- | RGB
| |-- | depth
| |-- | GT
|-- STERE
| |-- RGB
| |-- depth
| |-- GT
...
Training, Testing, and Evaluation
cd RGBD_VST
- Download the pretrained T2T-ViT_t-14 model [baidu pan fetch code: 2u34 | Google drive] and put it into
pretrained_model/
folder. - Run
python train_test_eval.py --Training True --Testing True --Evaluation True
for training, testing, and evaluation. The predictions will be inpreds/
folder and the evaluation results will be inresult.txt
file.
Testing on Our Pretrained RGB-D VST Model
cd RGBD_VST
- Download our pretrained
RGBD_VST.pth
[baidu pan fetch code: zt0v | Google drive] and then put it incheckpoint/
folder. - Run
python train_test_eval.py --Testing True --Evaluation True
for testing and evaluation. The predictions will be inpreds/
folder and the evaluation results will be inresult.txt
file.
Our saliency maps can be downloaded from [baidu pan fetch code: jovk | Google drive].
SOTA Saliency Maps for Comparison
The saliency maps of the state-of-the-art methods in our paper can be downloaded from [baidu pan fetch code: i1we | Google drive].
Acknowledgement
We thank the authors of Egnet for providing codes of generating contour maps. We also thank Zhao Zhang for providing the efficient evaluation tool.
Citation
If you think our work is helpful, please cite
@InProceedings{Liu_2021_ICCV,
author = {Liu, Nian and Zhang, Ni and Wan, Kaiyuan and Shao, Ling and Han, Junwei},
title = {Visual Saliency Transformer},
booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
month = {October},
year = {2021},
pages = {4722-4732}
}