Awesome
MGT-Net
Official code repository for paper Mutual-Guidance Transformer-Embedding Network for Video Salient Object Detection
<p align="center"> <img src="./img/MGTNet.PNG" width="100%"/> <br /> <em> Overall framework of the proposed MGT-Net. The two input streams (RGB and OF) use the same symmetric network structure. </em> </p>Usage
Each dataset corresponds to a txt path file, with each row arranged by img_path, gt_path and flow_path.
Training
- Download the training dataset (containing DAVIS16, DAVSOD, FBMS and DUTS-TR) from Baidu Driver (PSW:wuqv).
- Download the pre_trained ResNet50 backbone and Vit-B_16 backbone (PSW:zouw) to your specified folder. Vit-B_16 backbone is used to initialize some parameters in MGTrans, such as linear layer, FFN layer, etc.
- The training of entire model is implemented on four NVIDIA TITAN X (Pascal) GPUs:
- Run
CUDA_VISIBLE_DEVICES=0,1,2,3 python -m torch.distributed.launch --nproc_per_node=4 train_distribute.py
Testing
- Download the test dataset (containing DAVIS16, DAVSOD, FBMS, SegTrack-V2, VOS and ViSal) from Baidu Driver (PSW:wuqv).
- Download the final trained model from Baidu Driver (PSW:yl8s).
- Run
python test.py
.
Result
- The saliency maps can be download from Baidu Driver (PSW: 9oqe)
- Evaluation Toolbox: We use the standard evaluation toolbox from DAVSOD benchmark.
Citation
Please cite the following paper if you use this repository in your research:
@article{min2022mutual,
title={Mutual-Guidance Transformer-Embedding Network for Video Salient Object Detection},
author={Min, Dingyao and Zhang, Chao and Lu, Yukang and Fu, Keren and Zhao, Qijun},
journal={IEEE Signal Processing Letters},
volume={29},
pages={1674--1678},
year={2022},
publisher={IEEE}
}