Awesome
Cross-Modal Fusion and Progressive Decoding Network For RGB-D Salient Object Detection
The paper was accepted by the International Journal of Computer Vision on January 11, 2024. The paper link is: - Link
CPNet
Most existing RGB-D salient object detection (SOD) methods tend to achieve higher performance by integrating additional modules, such as feature enhancement and edge generation. There is no doubt that these modules will inevitably produce feature redundancy and performance degradation. To this end, we exquisitely design a crossmodal fusion and progressive decoding network to achieve RGB-D SOD tasks. The designed network structure only includes three indispensable parts: feature encoding, feature fusion and feature decoding. Specifically, in the feature encoding part, we adopt a two-stream Swin Transformer encoder to extract multi-level and multi-scale features from RGB images and depth images respectively to model global information. In the feature fusion part, we design a cross-modal attention fusion module, which can leverage the attention mechanism to fuse multi-modality and multi-level features. In the feature decoding part, we design a progressive decoder to gradually fuse low-level features and filter noise information to accurately predict salient objects. Extensive experimental results on 6 benchmarks demonstrated that our network surpasses 12 state-of-the-art methods in terms of four metrics. In addition, it is also verified that for the RGB-D SOD task, the addition of the feature enhancement module and the edge generation module is not conducive to improving the detection performance under this framework, which provides new insights into the salient object detection task. Our codes will be available at https://github.com/hu-xh/CPNet.
Network Architecture
Results and Saliency maps
We perform quantitative comparisons and qualitative comparisons with 12 RGB-D SOD methods on six RGB-D datasets.
Prerequisites
- Python 3.6
- Pytorch 1.10.2
- Torchvision 0.11.3
- Numpy 1.19.2
Pretrained Model
Download the following pth
and put it into main folder
- Swin-B with the fetch code:ja95.
Datasets
- Train Datasets with the fetch code:1234.
- Test Datasets with the fetch code:1234.
Results
You can download the tested results map at - [Baidu Pan link] (https://pan.baidu.com/s/1PlmqAvlAwSzsH2YGR4VzKQ) with the fetch code:dq2w.
- We fixed the codes and uploaded a new trained parameter pth. (https://pan.baidu.com/s/1Kfkvv80irU7kV6ojrcvujg?pwd=1234) with the fetch code:1234
Contact
Feel free to send e-mails to me (1558239392@qq.com).
Relevant Literature
@article{DBLP:journals/ijcv/HuSSWL24,
author = {Xihang Hu and
Fuming Sun and
Jing Sun and
Fasheng Wang and
Haojie Li},
title = {Cross-Modal Fusion and Progressive Decoding Network for {RGB-D} Salient
Object Detection},
journal = {Int. J. Comput. Vis.},
volume = {132},
number = {8},
pages = {3067--3085},
year = {2024},
url = {https://doi.org/10.1007/s11263-024-02020-y},
doi = {10.1007/S11263-024-02020-Y},
}