Home

Awesome

InterFormer

This repo is the official implementation of ICCV2023 paper "InterFormer: Real-time Interactive Image Segmentation"

Introduction

InterFormer follows a new pipeline to address the issues of existing pipeline's low computational efficiency. InterFormer extracts and preprocesses the computationally time-consuming part i.e. image processing from the existing process. Specifically, InterFormer employs a large vision transformer (ViT) on high-performance devices to preprocess images in parallel, and then uses a lightweight module called interactive multi-head self attention (I-MSA) for interactive segmentation. InterFormer achieves real-time high-quality interactive segmentation on CPU-only devices.

Demo

The following GIF animations were created on CPU-only devices:

<img src="assets/cod2.gif" width="250"/> <img src="assets/apples.gif" width="250"/> <img src="assets/cod3.gif" width="250"/>

<img src="assets/cod1.gif" width="250"/> <img src="assets/cod5.gif" width="250"/> <img src="assets/crack.gif" width="250"/>

<img src="assets/parrot.gif" width="250"/> <img src="assets/sheep.gif" width="250"/> <img src="assets/swimmer.gif" width="250"/>

Usage

Install

Requirements

Ensure the following requirements are met before proceeding with the installation process:

Install PyTorch

To install PyTorch, please refer to the following resource: INSTALLING PREVIOUS VERSIONS OF PYTORCH

Install mmcv-full

pip install -U openmim
mim install mmcv-full==1.6.0

Install mmsegmentation

cd mmsegmentation
pip install -e .

Install Additional Dependency

pip install -r requirements.txt

Data preparation

COCO Dataset

To download the COCO dataset, please refer to cocodataset. You will need to download the following: 2017 Train Images, 2017 Val images and 2017 Panoptic Train/Val annotations into data

Alternatively, you can use the following script:

cd data/coco2017
bash coco2017.sh

The data is organized as follows:

data/coco2017/
├── annotations
│   ├── panoptic_train2017 [118287 entries exceeds filelimit, not opening dir]
│   ├── panoptic_train2017.json
│   ├── panoptic_val2017 [5000 entries exceeds filelimit, not opening dir]
│   └── panoptic_val2017.json
├── coco2017.sh
├── train2017 [118287 entries exceeds filelimit, not opening dir]
└── val2017 [5000 entries exceeds filelimit, not opening dir]

LVIS Dataset

To download the LVIS dataset, please refer to lvisdataset to download the images and annotations.

The data is organized as follows:

data/lvis/
├── lvis_v1_train.json
├── lvis_v1_train.json.zip
├── lvis_v1_val.json
├── lvis_v1_val.json.zip
├── train2017 [118287 entries exceeds filelimit, not opening dir]
├── train2017.zip
├── val2017 [5000 entries exceeds filelimit, not opening dir]
└── val2017.zip

SBD Dataset

To download the SBD dataset, please refer to SBD.

The data is organized as follows:

data/sbd/
├── benchmark_RELEASE
│   ├── dataset
│   │   ├── cls [11355 entries exceeds filelimit, not opening dir]
│   │   ├── img [11355 entries exceeds filelimit, not opening dir]
│   │   ├── inst [11355 entries exceeds filelimit, not opening dir]
│   │   ├── train.txt
│   │   └── val.txt
└── benchmark.tgz

DAVIS & GrabCut & Berkeley Datasets

Please download DAVIS GrabCut Berkeley from Reviving Iterative Training with Mask Guidance for Interactive Segmentation

The data is organized as follows:

data/
├── berkeley
│   └── Berkeley
│       ├── gt [100 entries exceeds filelimit, not opening dir]
│       ├── img [100 entries exceeds filelimit, not opening dir]
│       └── list
│           └── val.txt
├── davis
│   └── DAVIS
│       ├── gt [345 entries exceeds filelimit, not opening dir]
│       ├── img [345 entries exceeds filelimit, not opening dir]
│       └── list
│           ├── val_ctg.txt
│           └── val.txt
└── grabcut
    └── GrabCut
        ├── gt [50 entries exceeds filelimit, not opening dir]
        ├── img [50 entries exceeds filelimit, not opening dir]
        └── list
            └── val.txt

Training

MAE-Pretrained Weight

To download and transform the MAE-pretrained weights into mmseg-style, please refer to MAE.

E.g.

python tools/model_converters/beit2mmseg.py https://dl.fbaipublicfiles.com/mae/pretrain/mae_pretrain_vit_base.pth pretrain/mae_pretrain_vit_base_mmcls.pth

The required weight files are located in the pretrain directory and are organized as follows:

pretrain
├── mae_pretrain_vit_base_mmcls.pth
└── mae_pretrain_vit_large_mmcls.pth

Start Training

To start the training of InterFormer-Light, run the following script:

CUDA_VISIBLE_DEVICES=0,1,2,3 bash tools/dist_train.sh configs/interformer_light_coco_lvis_320k.py 4 --seed 42 --no-validate

To train InterFormer-Tiny, use the following script:

CUDA_VISIBLE_DEVICES=0,1,2,3 bash tools/dist_train.sh configs/interformer_tiny_coco_lvis_320k.py 4 --seed 42 --no-validate

The trained weights are stored in work_dirs/interformer_light_coco_lvis_320k or work_dirs/interformer_tiny_coco_lvis_320k.

Evaluation

The trained weights are available at InterFormer

To start the evaluation on the GrabCut, Berkeley, SBD, or DAVIS dataset, use the following script:

CUDA_VISIBLE_DEVICES=0,1,2,3 bash tools/dist_clicktest.sh ${CHECKPOINT_FILE} ${GPU_NUM} [--dataset ${DATASET_NAME}] [--size_divisor ${SIZE_DIVISOR}]

where CHECKPOINT_FILE is the path to the trained weight file, GPU_NUM is the number of GPUs used for evaluation, DATASET_NAME is the name of the dataset to evaluate on, and SIZE_DIVISOR is the divisor used to pad the image. The script will look for the CONFIG_FILE in the same folder of CHECKPOINT_FILE (.py extension).

For example, assume the data is organized as follows:

work_dirs/
└── interformer_tiny_coco_lvis_320k
    ├── interformer_tiny_coco_lvis_320k.py
    └── iter_320000.pth

To evaluate on SBD with InterFormer-Tiny, run:

CUDA_VISIBLE_DEVICES=0,1,2,3 bash tools/dist_clicktest.sh work_dirs/interformer_tiny_coco_lvis_320k/iter_320000.pth 4 --dataset sbd --size_divisor 32

This command will start the evaluation by specifying the trained weight file work_dirs/interformer_tiny_coco_lvis_320k/iter_320000.pth and loading the configuration file interformer_tiny_coco_lvis_320k.py in the same folder.

The results are stored in work_dirs/interformer_tiny_coco_lvis_320k/clicktest_sbd_iter_320000_xxxx.json.

Running Demo

To run the demo directly with Python, use the following command in your terminal:

python demo/main.py path/to/checkpoint --device [cpu|cuda:0]

where:

Here's an example script to run the demo:

python demo/main.py work_dirs/interformer_tiny_coco_lvis_320k/iter_320000.pth --device cpu

License

This project is licensed under the MIT License - see the LICENSE file for details.