Home

Awesome

<br/> <br/> <p align="center"> <a href="SOD.png"> <img src="SOD.png" alt="Salient Object detection"> </a> </p> <br/> <p align="center"> <a href="dog_running.gif"> <img src="dog_running.gif" alt="SOD of dog running" width="400"> </a> </p> <br/> <h3 align="center">Salient Object detection - examples created using published model</h3> <h4 align="center"><a href="https://medium.com/@taskswithcode/twc-9-7c960c921f69">Newsletter describing this paper</a></h4> <h4 align="center"><a href="https://taskswithcode.com/salient_object_detection">App using the model described in this paper</a></h4> <br/> <br/>

<a href="https://taskswithcode.com/salient_object_detection"><img src="app_pic.png" width="500px" align="middle" hspace="10" vspace="20"/></a>

Table of contents

Revisiting Image Pyramid Structure for High Resolution Salient Object Detection

PWC PWC PWC PWC PWC PWC PWC PWC

PyTorch implementation of Revisiting Image Pyramid Structure for High Resolution Salient Object Detection (InSPyReNet)

To appear in the 16th Asian Conference on Computer Vision (ACCV2022)

[Arxiv]

Abstract: Salient object detection (SOD) has been in the spotlight recently, yet has been studied less for high-resolution (HR) images. Unfortunately, HR images and their pixel-level annotations are certainly more labor-intensive and time-consuming compared to low-resolution (LR) images. Therefore, we propose an image pyramid-based SOD framework, Inverse Saliency Pyramid Reconstruction Network (InSPyReNet), for HR prediction without any of HR datasets. We design InSPyReNet to produce a strict image pyramid structure of saliency map, which enables to ensemble multiple results with pyramid-based image blending. For HR prediction, we design a pyramid blending method which synthesizes two different image pyramids from a pair of LR and HR scale from the same image to overcome effective receptive field (ERF) discrepancy. Our extensive evaluation on public LR and HR SOD benchmarks demonstrates that InSPyReNet surpasses the State-of-the-Art (SotA) methods on various SOD metrics and boundary accuracy.

Demo :rocket:

Demo1Demo2
<img src=./figures/fig_demo1.gif height="350px" width="350px"><img src=./figures/fig_demo2.gif height="350px" width="350px">

Architecture

InSPyReNetpyramid blending
<img src=./figures/fig_architecture.png height="350px" width="350px"><img src=./figures/fig_pyramid_blending.png height="350px" width="350px">

1. Create environment

2. Preparation

ItemDestination FolderOneDriveGDrive
Train Datasetsdata/Train_Dataset/...LinkLink
Test Datasetsdata/Test_Dataset/...LinkLink
Res2Net50 checkpointdata/backbone_ckpt/*.pthLinkLink
SwinB checkpointdata/backbone_ckpt/*.pthLinkLink

3. Train & Evaluate

python run/Train.py --config configs/InSPyReNet_SwinB.yaml --verbose
Train:
  Dataset:
      type: "RGB_Dataset"
      root: "data/RGB_Dataset/Train_Dataset"
      sets: ['DUTS-TR'] --> ['DUTS-TR', 'HRSOD-TR-LR', 'UHRSD-TR-LR']
python run/Test.py --config configs/InSPyReNet_SwinB.yaml --verbose
python run/Eval.py --config configs/InSPyReNet_SwinB.yaml --verbose

4. Inference on your own data

5. Checkpoints

Note: If you want to try our trained checkpoints below, please make sure to locate latest.pth file to the Test.Checkpoint.checkpoint_dir.

Trained with LR dataset only (DUTS-TR, 384 X 384)

BackboneTrain DBConfigOneDriveGDrive
Res2Net50DUTS-TRInSPyReNet_Res2Net50LinkLink
SwinBDUTS-TRInSPyReNet_SwinB.yamlLinkLink

Trained with LR+HR dataset (with LR scale 384 X 384)

BackboneTrain DBConfigOneDriveGDrive
SwinBDUTS-TR, HRSOD-TR-LRInSPyReNet_SwinB.yamlLinkLink
SwinBHRSOD-TR-LR, UHRSD-TR-LRInSPyReNet_SwinB.yamlLinkLink
SwinBDUTS-TR, HRSOD-TR-LR, UHRSD-TR-LRInSPyReNet_SwinB.yamlLinkLink

Trained with LR+HR dataset (with HR scale 1024 X 1024)

BackboneTrain DBConfigOneDriveGDrive
SwinBDUTS-TR, HRSOD-TRInSPyReNet_SwinB.yamlLinkLink
SwinBHRSOD-TR, UHRSD-TRInSPyReNet_SwinB.yamlLinkLink

Trained with Massive SOD Datasets (with LR scale 384 x 384, Not in the paper, just for fun!)

BackboneTrain DBConfigOneDriveGDrive
SwinBDUTS-TR, DUTS-TE, FSS-1000, MSRA-10K, ECSSD, HRSOD-TR-LR, UHRSD-TR-LRInSPyReNet_SwinB.yamlLinkLink

6. Pre-Computed Saliency Maps

Note: Due to the cloud memory shortage, we only provide results trained on DUTS-TR only. Please generate yourself for the models with extra training datasets if you need.

BackboneDUTS-TEDUT-OMRONECSSDHKU-ISPASCAL-SDAVIS-SHRSOD-TEUHRSD-TE
Res2Net50LinkLinkLinkLinkLinkN/AN/AN/A
SwinBLinkLinkLinkLinkLinkLinkLinkLink

7. Results

LR BenchmarkHR BenchmarkHR Benchmark (Trained with extra DB)
<img src=./figures/fig_quantitative.png height="250px" width="250px"><img src=./figures/fig_quantitative2.png height="250px" width="250px"><img src=./figures/fig_quantitative3.png height="250px" width="250px">
DAVIS-S & HRSODUHRSDUHRSD (Trained with extra DB)
<img src=./figures/fig_qualitative.png height="250px" width="250px"><img src=./figures/fig_qualitative2.png height="250px" width="250px"><img src=./figures/fig_qualitative3.jpg height="250px" width="250px">

Citation

@article{kim2022revisiting,
  title={Revisiting Image Pyramid Structure for High Resolution Salient Object Detection},
  author={Kim, Taehun and Kim, Kunhee and Lee, Joonyeong and Cha, Dongmin and Lee, Jiho and Kim, Daijin},
  journal={arXiv preprint arXiv:2209.09475},
  year={2022}
}

References