Home

Awesome

ZIM: Zero-Shot Image Matting for Anything

Beomyoung Kim, Chanyong Shin, Joonhyun Jeong, Hyungsik Jung, Se-Yun Lee, Sewhan Chun, Dong-Hyun Hwang, Joonsang Yu<br>

<sub>NAVER Cloud, ImageVision</sub><br />

Paper Page 🤗 demo 🤗 Dataset 🤗 Models 🤗 Collection

Teaser Teaser

Introduction

The recent segmentation foundation model, Segment Anything Model (SAM), exhibits strong zero-shot segmentation capabilities, but it falls short in generating fine-grained precise masks. To address this limitation, we propose a novel zero-shot image matting model, called ZIM, with two key contributions: First, we develop a label converter that transforms segmentation labels into detailed matte labels, constructing the new SA1B-Matte dataset without costly manual annotations. Training SAM with this dataset enables it to generate precise matte masks while maintaining its zero-shot capability. Second, we design the zero-shot matting model equipped with a hierarchical pixel decoder to enhance mask representation, along with a prompt-aware masked attention mechanism to improve performance by enabling the model to focus on regions specified by visual prompts. We evaluate ZIM using the newly introduced MicroMat-3K test set, which contains high-quality micro-level matte labels. Experimental results show that ZIM outperforms existing methods in fine-grained mask generation and zero-shot generalization. Furthermore, we demonstrate the versatility of ZIM in various downstream tasks requiring precise masks, such as image inpainting and 3D NeRF. Our contributions provide a robust foundation for advancing zero-shot matting and its downstream applications across a wide range of computer vision tasks.

Model overview

Updates

Installation

Install the required packages with the command below:

pip install zim_anything

or

git clone https://github.com/naver-ai/ZIM.git
cd ZIM; pip install -e .

To enable GPU acceleration, please install the compatible onnxruntime-gpu package based on your environment settings (CUDA and CuDNN versions), following the instructions in the onnxruntime installation docs.

Demo

Hugging Face We provide a Gradio demo code in demo/gradio_demo.py. You can run our model demo locally by running:

python demo/gradio_demo.py

Hugging Face In addition, we provide a Gradio demo code demo/gradio_demo_comparison.py to qualitatively compare ZIM with SAM:

python demo/gradio_demo.py

Getting Started

After the installation step is done, you can utilize our model in just a few lines as below. ZimPredictor is compatible with SamPredictor, such as set_image() or predict().

from zim_anything import zim_model_registry, ZimPredictor

backbone = "vit_l"
ckpt_p = "results/zim_vit_l_2092"

model = zim_model_registry[backbone](checkpoint=ckpt_p)
if torch.cuda.is_available():
    model.cuda()

predictor = ZimPredictor(model)
predictor.set_image(<image>)
masks, _, _ = predictor.predict(<input_prompts>)

We also provide code for generating masks for an entire image and visualization:

from zim_anything import zim_model_registry, ZimAutomaticMaskGenerator
from zim_anything.utils import show_mat_anns

backbone = "vit_l"
ckpt_p = "results/zim_vit_l_2092"

model = zim_model_registry[backbone](checkpoint=ckpt_p)
if torch.cuda.is_available():
    model.cuda()

mask_generator = ZimAutomaticMaskGenerator(model)
masks = mask_generator.generate(<image>)  # Automatically generated masks
masks_vis = show_mat_anns(<image>, masks)  # Visualize masks

Additionally, masks can be generated for images from the command line:

bash script/run_amg.sh

We provide Pretrained-weights of ZIM.

MODEL ZOOLink
zim_vit_bdownload
zim_vit_ldownload

Dataset Preparation

1) MicroMat-3K Dataset

MicroMat-3K We introduce a new test set named MicroMat-3K, to evaluate zero-shot interactive matting models. It consists of 3,000 high-resolution images paired with micro-level matte labels, providing a comprehensive benchmark for testing various matting models under different levels of detail.

Downloading MicroMat-3K dataset is available here or huggingface

1-1) Dataset structure

Dataset structure should be as follows:

└── /path/to/dataset/MicroMat3K
    ├── img
    │   ├── 0001.png
    ├── matte
    │   ├── coarse
    │   │   ├── 0001.png
    │   └── fine
    │       ├── 0001.png
    ├── prompt
    │   ├── coarse
    │   │   ├── 0001.png
    │   └── fine
    │       ├── 0001.png
    └── seg
        ├── coarse
        │   ├── 0001_01.json
        └── fine
            ├── 0001_01.json

1-2) Prompt file configuration

Prompt file configuration should be as follows:

{
    "point": [[x1, y1, 1], [x2, y2, 0], ...],   # 1: Positive, 0: Negative prompt
    "bbox": [x1, y1, x2, y2]                    # [X, Y, X, Y] format
}

Evaluation

We provide an evaluation script, which includes a comparison with SAM, in script/run_eval.sh. Make sure the dataset structure is prepared.

First, modify data_root in script/run_eval.sh

...
data_root="/path/to/dataset/"
...

Then, run evaluation script file.

bash script/run_eval.sh

The evaluation result on the MicroMat-3K dataset would be as follows:

Table

How To Cite

@article{kim2024zim,
  title={ZIM: Zero-Shot Image Matting for Anything},
  author={Kim, Beomyoung and Shin, Chanyong and Jeong, Joonhyun and Jung, Hyungsik and Lee, Se-Yun and Chun, Sewhan and Hwang, Dong-Hyun and Yu, Joonsang},
  journal={arXiv preprint arXiv:2411.00626},
  year={2024}
}

License

ZIM
Copyright (c) 2024-present NAVER Cloud Corp.
CC BY-NC 4.0 (https://creativecommons.org/licenses/by-nc/4.0/)