Awesome

Segment Anything with Clip

[HuggingFace Space] | [COLAB] | [Demo Video]

Meta released a new foundation model for segmentation tasks. It aims to resolve downstream segmentation tasks with prompt engineering, such as foreground/background points, bounding box, mask, and free-formed text. However, the text prompt is not released yet.

Alternatively, I took the following steps:

Get all object proposals generated by SAM (Segment Anything Model).
Crop the object regions by bounding boxes.
Get cropped images' features and a query feature from CLIP.
Calculate the similarity between image features and the query feature.

# How to get the similarity.
preprocessed_img = preprocess(crop).unsqueeze(0)
tokens = clip.tokenize(texts)
logits_per_image, _ = model(preprocessed_img, tokens)
similarity = logits_per_image.softmax(-1)

How to run on local

Anaconda is required before start setup.

make env
conda activate segment-anything-with-clip
make setup

# this executes GRadio server.
make run

Open http://localhost:7860/

Successive Works

Fast Segment Everything: Re-implemented Everything algorithm in iterative manner that is better for CPU only environments. It shows comparable results to the original Everything within 1/5 number of inferences (e.g. 1024 vs 200), and it takes under 10 seconds to search for masks on a CPU upgrade instance (8 vCPU, 32GB RAM) of Huggingface space.
Fast Segment Everything with Text Prompt: This example based on Fast-Segment-Everything provides a text prompt that generates an attention map for the area you want to focus on.
Fast Segment Everything with Image Prompt: This example based on Fast-Segment-Everything provides an image prompt that generates an attention map for the area you want to focus on.
Fast Segment Everything with Drawing Prompt: This example based on Fast-Segment-Everything provides a drawing prompt that generates an attention map for the area you want to focus on.

Awesome

Segment Anything with Clip

How to run on local

Successive Works

References