Home

Awesome

MGT-Net

Official code repository for paper Mutual-Guidance Transformer-Embedding Network for Video Salient Object Detection

<p align="center"> <img src="./img/MGTNet.PNG" width="100%"/> <br /> <em> Overall framework of the proposed MGT-Net. The two input streams (RGB and OF) use the same symmetric network structure. </em> </p>

Usage

Each dataset corresponds to a txt path file, with each row arranged by img_path, gt_path and flow_path.

Training

  1. Download the training dataset (containing DAVIS16, DAVSOD, FBMS and DUTS-TR) from Baidu Driver (PSW:wuqv).
  2. Download the pre_trained ResNet50 backbone and Vit-B_16 backbone (PSW:zouw) to your specified folder. Vit-B_16 backbone is used to initialize some parameters in MGTrans, such as linear layer, FFN layer, etc.
  3. The training of entire model is implemented on four NVIDIA TITAN X (Pascal) GPUs:

Testing

  1. Download the test dataset (containing DAVIS16, DAVSOD, FBMS, SegTrack-V2, VOS and ViSal) from Baidu Driver (PSW:wuqv).
  2. Download the final trained model from Baidu Driver (PSW:yl8s).
  3. Run python test.py.

Result

  1. The saliency maps can be download from Baidu Driver (PSW: 9oqe)
  2. Evaluation Toolbox: We use the standard evaluation toolbox from DAVSOD benchmark.
<p align="center"> <img src="./img/result.PNG" width="100%"/> <br /> <em> QUANTITATIVE COMPARISON WITH SOTA METHODS ON FIVE PUBLIC DATASETS IN TERM OF THREE METRICS. THE TOP THREE RESULTS ARE HIGHLIGHTED IN RED, GREEN, AND BLUE, RESPECTIVELY. OF COLUMN MEANS IF OF IS USED AS INPUT. </em> </p>

Citation

Please cite the following paper if you use this repository in your research:

@article{min2022mutual,
  title={Mutual-Guidance Transformer-Embedding Network for Video Salient Object Detection},
  author={Min, Dingyao and Zhang, Chao and Lu, Yukang and Fu, Keren and Zhao, Qijun},
  journal={IEEE Signal Processing Letters},
  volume={29},
  pages={1674--1678},
  year={2022},
  publisher={IEEE}
}