Home

Awesome

FCFI

The official implementation of the paper "Focused and Collaborative Feedback Integration for Interactive Image Segmentation" in CVPR2023.

[arXiv] [Paper] [Poster] [Video]

Introduction

<div align=center><img src="figs/beginning.jpg" width="50%"/></div>

Abstract. Interactive image segmentation aims at obtaining a segmentation mask for an image using simple user annotations. During each round of interaction, the segmentation result from the previous round serves as feedback to guide the user's annotation and provides dense prior information for the segmentation model, effectively acting as a bridge between interactions. Existing methods overlook the importance of feedback or simply concatenate it with the original input, leading to underutilization of feedback and an increase in the number of required annotations. To address this, we propose an approach called Focused and Collaborative Feedback Integration (FCFI) to fully exploit the feedback for click-based interactive image segmentation. FCFI first focuses on a local area around the new click and corrects the feedback based on the similarities of high-level features. It then alternately and collaboratively updates the feedback and deep features to integrate the feedback into the features. The efficacy and efficiency of FCFI were validated on four benchmarks, namely GrabCut, Berkeley, SBD, and DAVIS. Experimental results show that FCFI achieved new state-of-the-art performance with less computational overhead than previous methods.

Setup

Requirements

This work was built using Python 3.8 and relies on PyTorch 1.10.0. The following command installs all necessary packages:

pip install -r requirements.txt

Datasets

We trained our models on SBD for the ResNet-101 backbone and on COCO+LVIS for the HRNet18s and HRNet18 backbones.

Training Datasets

DatasetDescriptionDownload Link
SBD8498 images with 20172 instances for (train)<br>2857 images with 6671 instances for (test)Official site
COCO+LVIS99k images with 1.5M instances (train)Original LVIS images + <br> combined annotations

Evaluation Datasets

DatasetDescriptionDownload Link
GrabCut50 images with one object each (test)GrabCut.zip (11 MB)
Berkeley96 images with 100 instances (test)Berkeley.zip (7 MB)
DAVIS345 images with one object each (test)DAVIS.zip (43 MB)

Pre-Trained Models

For Training

hrnetv2_w18_imagenet_pretrained.pth

For Interactive Segmentation

<table> <thead align="center"> <tr> <th rowspan="2">Model</th> <th rowspan="2">Training</span><br>Set</span></th> <th colspan="2">GrabCut</th> <th>Berkeley</th> <th colspan="2">SBD</th> <th colspan="2">DAVIS</th> </tr> <tr> <td>NoC<br>@85%</td> <td>NoC<br>@90%</td> <td>NoC<br>@90%</td> <td>NoC<br>@85%</td> <td>NoC<br>@90%</td> <td>NoC<br>@85%</td> <td>NoC<br>@90%</td> </tr> </thead> <tbody align="center"> <tr> <td align="left"><a href="https://drive.google.com/u/0/uc?id=1bxfaDRZAv7qNfHZXtVeNken2ndpFJ7BF&export=download">ResNet-101<br>(224 MB)</a></td> <td rowspan="1">SBD</td> <td>1.64</td> <td>1.80</td> <td>2.83</td> <td>3.26</td> <td>5.36</td> <td>4.75</td> <td>6.46</td> </tr> <tr> <td align="left"><a href="https://drive.google.com/u/0/uc?id=1RzUKXWIydewf6qZy48HyLoU21arOVZg2&export=download">HRNet18s<br>(39.5 MB)</a></td> <td rowspan="4">COCO+<br>LVIS</td> <td>1.42</td> <td>1.68</td> <td>2.03</td> <td>3.64</td> <td>5.92</td> <td>3.89</td> <td>5.13</td> </tr> </tbody> </table>

Training

We provide the scripts for training our models.

For each experiment, a separate folder is created for saving Tensorboard logs, text logs, visualization and checkpoints. You can specify another path in the ./configs/train/config.yml (see the EXPS_PATH variable).

You can also specify other paths for the training datasets in the ./configs/train/config.yml.

You can start training using the following commands:

# Train DeepLabV3+ with ResNet-101 as the backbone
sh scripts/train_r101.sh

# Train HRNet+OCR with HRNet-18s as the backbone
sh scripts/train_h18s.sh

Evaluation

We provide the scripts to evaluate our models on four benchmarks: GrabCut, Berkeley, SBD, and DAVIS.

To evaluate a model, you should download its corresponding checkpoint and specify the path of the checkpoint in the script (see the --resume-path variable). You can specify the paths of the evaluation datasets in the configuration files in the ./configs/val folder.

# Evaluate DeepLabV3+ with ResNet-101 as the backbone
sh scripts/test_r101.sh

# Evaluate HRNet+OCR with HRNet-18s as the backbone
sh scripts/test_h18s.sh

If you want to obtain the resulting segmentation masks, add --vis-preds in the script. If you want to display annotated clicks on the segmentation masks, add --vis-points in the script.

Interactive Demo

<div align=center><img src="figs/demo.png" width="100%"/></div> We provide an interactive demo. Feel free to explore it using the following command:
sh scripts/demo.sh

We would like to extend our special thanks to the following contributors:

Citation

If you find FCFI useful in your research, please cite our paper:

@InProceedings{Wei_2023_CVPR,
    author    = {Wei, Qiaoqiao and Zhang, Hui and Yong, Jun-Hai},
    title     = {Focused and Collaborative Feedback Integration for Interactive Image Segmentation},
    booktitle = {Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
    month     = {June},
    year      = {2023},
    pages     = {18643-18652}
}