Awesome

ICASSP204 : Zero-Shot Co-salient Object Detection Framework

This repository is the official PyTorch implementation of our zero-shot cosod framework. [arXiv]

Abstract

Co-salient Object Detection (CoSOD) endeavors to replicate the human visual system’s capacity to recognize common and salient objects within a collection of images. Despite recent advancements in deep learning models, these models still rely on training with well-annotated CoSOD datasets. The exploration of training-free zero-shot CoSOD frameworks has been limited. In this paper, taking inspiration from the zero-shot transfer capabilities of foundational computer vision models, we introduce the first zero-shot CoSOD framework that harnesses these models without any training process. To achieve this, we introduce two novel components in our proposed framework: the group prompt generation (GPG) module and the co-saliency map generation (CMP) module. We evaluate the framework’s performance on widely-used datasets and observe impressive results. Our approach surpasses existing unsupervised methods and even outperforms fully supervised methods developed before 2020, while remaining competitive with some fully supervised methods developed before 2022.

Framework Overview

Results

quantitative results

qualitative results

Usage

Environment

conda create -n zscosod python=3.9
conda activate zscosod 
pip install -e .
pip install -r requirements.txt

Datasets preparation

Download all the test datasets from my google-drive or BaiduYun (fetch code: qwt8). The file directory structure is as follows:

+-- zs-cosod
|   +-- data 
|       +-- CoSal2015 (Testing Dataset)
|           +-- img (Image Groups)  
|           +-- gt (Ground Truth Groups)
|           +-- blip2-caption (Image Caption)
|       +-- CoCA (Testing Dataset)  
|       +-- CoSOD3k (Testing Dataset)   
|   ...

Test and evalutation

Download the ckeckpoints of TSDN and SAM from google-drive | BaiduYun (fetch code: be34). Place the ckpt folder in the main directory. Here is a command example of testing our model (test CoSal2015 with vit-base backbone).
```
1. sh sd-dino/extract_feat.sh (Feature Extraction by StableDiffusion-1.5 and DINOv2-base)
2. sh A2S-v2/inference_sod.sh (Saliency Map Generation by Unsupervised TSDN)
3. sh inference_cosod.sh (CoSaliency Map Generation) 
```
Run the following command to evaluate your prediction results，the metrics include max F-measure, S-measure, and MAE.
```
CUDA_VISIBLE_DEVICES=0 python evaluate.py --pred_root results --datasets CoSal2015
```
For more metrics, CoSOD evaluation toolbox eval-co-sod is strongly recommended.

Citation

 @article{DBLP:journals/corr/abs-2309-05499,
 author       = {Haoke Xiao and
                   Lv Tang and
                   Bo Li and
                   Zhiming Luo and
                   Shaozi Li},
 title        = {Zero-Shot Co-salient Object Detection Framework},
 journal      = {CoRR},
 volume       = {abs/2309.05499},
 year         = {2023}
 }

Acknowledgement

Our code is largely based on the following open-source projects: ODISE, dino-vit-features (official implementation), dino-vit-features (Kamal Gupta's implementation), SAM, and TSDN. Our heartfelt gratitude goes to the developers of these resources!

Contact

Feel free to leave issues here or send me e-mails (hk.xiao.me@gmail.com).