Awesome

UniTR: A Unified TRansformer-based Framework for Co-object and Multi-modal Saliency Detection [TMM 2024]

Created by Ruohao Guo, Xianghua Ying*, Yanyu Qi, Liao Qu

This repository contains PyTorch implementation for paper "UniTR: A Unified TRansformer-based Framework for Co-object and Multi-modal Saliency Detection".

In this paper, we develop a Unified TRansformer-based framework, namely UniTR, aiming at tackling the above tasks individually with a unified architecture. Specifically, a transformer module (CoFormer) is introduced to learn the consistency of relevant objects or complementarity from different modalities. To generate high-quality segmentation maps, we adopt a dualstream decoding paradigm that allows the extracted consistent or complementary information to better guide mask prediction. Moreover, a feature fusion module (ZoomFormer) is designed to enhance backbone features and capture multi-granularity and multi-semantic information. Extensive experiments show that our UniTR performs well on 17 benchmarks, and surpasses existing SOTA approaches.

Usage

Installation

conda create -n unitr python=3.8 -y
conda activate unitr
pip install torch==1.11.0 torchvision==0.12.0
pip install timm opencv-python einops
pip install tensorboardX pycocotools imageio scipy moviepy thop

Co-object Saliency Detection

Training

co-segmentation and co-saliency object detection (training data: COCO2017):

cd ./co_object_saliency_detection
python main.py

video salient object detection (training data: DAVIS and FBMS):

cd ./co_object_saliency_detection
python finetune.py

Inference

co-segmentation (checkpoint: unitr_cos_swin.pth, unitr_cos_vgg.pth):

cd ./co_object_saliency_detection
python generate_maps_cos.py

co-saliency object detection (checkpoint: unitr_cosod_swin.pth, unitr_cosod_vgg.pth):

cd ./co_object_saliency_detection
python generate_maps_cosod.py

video salient object detection (checkpoint: unitr_vsod_swin.pth):

cd ./co_object_saliency_detection
python generate_maps_vsod.py

Evaluation

co-segmentation (results: unitr_cos_swin):

cd ./co_object_saliency_detection
python generate_maps_cos.py

co-saliency object detection (results: unitr_cosod_swin, unitr_cosod_vgg):

cd ./co_object_saliency_detection/eval
sh eval_cosod.sh

video salient object detection (results: unitr_vsod_swin):

cd ./co_object_saliency_detection/eval
sh eval_vsod.sh

Multi-modal Saliency Detection

Training

RGB-T salient object detection (training data: VT5000):

cd ./multi_modal_saliency_detection/train
python train_rgbt.py

RGB-D salient object detection (training data: NLPR_NJUD):

cd ./multi_modal_saliency_detection/train
python train_rgbd.py

Inference

RGB-T salient object detection (checkpoint: unitr_rgbt_swin.pth, unitr_rgbt_vgg.pth):

cd ./multi_modal_saliency_detection/test
python generate_maps_rgbt.py

RGB-D salient object detection (checkpoint: unitr_rgbd_swin.pth, unitr_rgbd_res.pth):

cd ./multi_modal_saliency_detection/test
python generate_maps_rgbd.py

Evaluation

RGB-T salient object detection (results: unitr_rgbt_swin, unitr_rgbt_vgg:

cd ./multi_modal_saliency_detection/eval
python eval_rgbt.py

RGB-D salient object detection (results: unitr_rgbd_swin, unitr_rgbd_res):

cd ./multi_modal_saliency_detection/eval
python eval_rgbd.py

FAQ

If you want to improve the usability or any piece of advice, please feel free to contant directly (ruohguo@foxmail.com).

Acknowledgement

Thanks SSNM, Swin, UFO, and SwinNet contribution to the community!

Citation

Please consider citing our paper in your publications if the project helps your research. BibTeX reference is as follow.

@ARTICLE{10444934,
  author={Guo, Ruohao and Ying, Xianghua and Qi, Yanyu and Qu, Liao},
  journal={IEEE Transactions on Multimedia}, 
  title={UniTR: A Unified TRansformer-Based Framework for Co-Object and Multi-Modal Saliency Detection}, 
  year={2024},
  volume={26},
  pages={7622-7635},
  doi={10.1109/TMM.2024.3369922}}