Awesome
UniTR: A Unified TRansformer-based Framework for Co-object and Multi-modal Saliency Detection [TMM 2024]
Created by Ruohao Guo, Xianghua Ying*, Yanyu Qi, Liao Qu
This repository contains PyTorch implementation for paper "UniTR: A Unified TRansformer-based Framework for Co-object and Multi-modal Saliency Detection".
In this paper, we develop a Unified TRansformer-based framework, namely UniTR, aiming at tackling the above tasks individually with a unified architecture. Specifically, a transformer module (CoFormer) is introduced to learn the consistency of relevant objects or complementarity from different modalities. To generate high-quality segmentation maps, we adopt a dualstream decoding paradigm that allows the extracted consistent or complementary information to better guide mask prediction. Moreover, a feature fusion module (ZoomFormer) is designed to enhance backbone features and capture multi-granularity and multi-semantic information. Extensive experiments show that our UniTR performs well on 17 benchmarks, and surpasses existing SOTA approaches.
<img src="co_object_saliency_detection/images/unitr_overview.jpg" alt="image" style="zoom:60%;"/>
Usage
Installation
conda create -n unitr python=3.8 -y
conda activate unitr
pip install torch==1.11.0 torchvision==0.12.0
pip install timm opencv-python einops
pip install tensorboardX pycocotools imageio scipy moviepy thop
Co-object Saliency Detection
Training
- co-segmentation and co-saliency object detection (training data: COCO2017):
cd ./co_object_saliency_detection
python main.py
- video salient object detection (training data: DAVIS and FBMS):
cd ./co_object_saliency_detection
python finetune.py
Inference
- co-segmentation (checkpoint: unitr_cos_swin.pth, unitr_cos_vgg.pth):
cd ./co_object_saliency_detection
python generate_maps_cos.py
- co-saliency object detection (checkpoint: unitr_cosod_swin.pth, unitr_cosod_vgg.pth):
cd ./co_object_saliency_detection
python generate_maps_cosod.py
- video salient object detection (checkpoint: unitr_vsod_swin.pth):
cd ./co_object_saliency_detection
python generate_maps_vsod.py
Evaluation
- co-segmentation (results: unitr_cos_swin):
cd ./co_object_saliency_detection
python generate_maps_cos.py
- co-saliency object detection (results: unitr_cosod_swin, unitr_cosod_vgg):
cd ./co_object_saliency_detection/eval
sh eval_cosod.sh
- video salient object detection (results: unitr_vsod_swin):
cd ./co_object_saliency_detection/eval
sh eval_vsod.sh
Multi-modal Saliency Detection
Training
- RGB-T salient object detection (training data: VT5000):
cd ./multi_modal_saliency_detection/train
python train_rgbt.py
- RGB-D salient object detection (training data: NLPR_NJUD):
cd ./multi_modal_saliency_detection/train
python train_rgbd.py
Inference
- RGB-T salient object detection (checkpoint: unitr_rgbt_swin.pth, unitr_rgbt_vgg.pth):
cd ./multi_modal_saliency_detection/test
python generate_maps_rgbt.py
- RGB-D salient object detection (checkpoint: unitr_rgbd_swin.pth, unitr_rgbd_res.pth):
cd ./multi_modal_saliency_detection/test
python generate_maps_rgbd.py
Evaluation
- RGB-T salient object detection (results: unitr_rgbt_swin, unitr_rgbt_vgg:
cd ./multi_modal_saliency_detection/eval
python eval_rgbt.py
- RGB-D salient object detection (results: unitr_rgbd_swin, unitr_rgbd_res):
cd ./multi_modal_saliency_detection/eval
python eval_rgbd.py
FAQ
If you want to improve the usability or any piece of advice, please feel free to contant directly (ruohguo@foxmail.com).
Acknowledgement
Thanks SSNM, Swin, UFO, and SwinNet contribution to the community!
Citation
Please consider citing our paper in your publications if the project helps your research. BibTeX reference is as follow.
@ARTICLE{10444934,
author={Guo, Ruohao and Ying, Xianghua and Qi, Yanyu and Qu, Liao},
journal={IEEE Transactions on Multimedia},
title={UniTR: A Unified TRansformer-Based Framework for Co-Object and Multi-Modal Saliency Detection},
year={2024},
volume={26},
pages={7622-7635},
doi={10.1109/TMM.2024.3369922}}