Home

Awesome

UniTR: A Unified TRansformer-based Framework for Co-object and Multi-modal Saliency Detection [TMM 2024]

Created by Ruohao Guo, Xianghua Ying*, Yanyu Qi, Liao Qu

This repository contains PyTorch implementation for paper "UniTR: A Unified TRansformer-based Framework for Co-object and Multi-modal Saliency Detection".

In this paper, we develop a Unified TRansformer-based framework, namely UniTR, aiming at tackling the above tasks individually with a unified architecture. Specifically, a transformer module (CoFormer) is introduced to learn the consistency of relevant objects or complementarity from different modalities. To generate high-quality segmentation maps, we adopt a dualstream decoding paradigm that allows the extracted consistent or complementary information to better guide mask prediction. Moreover, a feature fusion module (ZoomFormer) is designed to enhance backbone features and capture multi-granularity and multi-semantic information. Extensive experiments show that our UniTR performs well on 17 benchmarks, and surpasses existing SOTA approaches.


<img src="co_object_saliency_detection/images/unitr_overview.jpg" alt="image" style="zoom:60%;"/>

Usage

Installation

conda create -n unitr python=3.8 -y
conda activate unitr
pip install torch==1.11.0 torchvision==0.12.0
pip install timm opencv-python einops
pip install tensorboardX pycocotools imageio scipy moviepy thop

Co-object Saliency Detection

Training

cd ./co_object_saliency_detection
python main.py
cd ./co_object_saliency_detection
python finetune.py

Inference

cd ./co_object_saliency_detection
python generate_maps_cos.py
cd ./co_object_saliency_detection
python generate_maps_cosod.py
cd ./co_object_saliency_detection
python generate_maps_vsod.py

Evaluation

cd ./co_object_saliency_detection
python generate_maps_cos.py
cd ./co_object_saliency_detection/eval
sh eval_cosod.sh
cd ./co_object_saliency_detection/eval
sh eval_vsod.sh

Multi-modal Saliency Detection

Training

cd ./multi_modal_saliency_detection/train
python train_rgbt.py
cd ./multi_modal_saliency_detection/train
python train_rgbd.py

Inference

cd ./multi_modal_saliency_detection/test
python generate_maps_rgbt.py
cd ./multi_modal_saliency_detection/test
python generate_maps_rgbd.py

Evaluation

cd ./multi_modal_saliency_detection/eval
python eval_rgbt.py
cd ./multi_modal_saliency_detection/eval
python eval_rgbd.py

FAQ

If you want to improve the usability or any piece of advice, please feel free to contant directly (ruohguo@foxmail.com).

Acknowledgement

Thanks SSNM, Swin, UFO, and SwinNet contribution to the community!

Citation

Please consider citing our paper in your publications if the project helps your research. BibTeX reference is as follow.

@ARTICLE{10444934,
  author={Guo, Ruohao and Ying, Xianghua and Qi, Yanyu and Qu, Liao},
  journal={IEEE Transactions on Multimedia}, 
  title={UniTR: A Unified TRansformer-Based Framework for Co-Object and Multi-Modal Saliency Detection}, 
  year={2024},
  volume={26},
  pages={7622-7635},
  doi={10.1109/TMM.2024.3369922}}