Home

Awesome

Segment Anything with Clip

[HuggingFace Space] | [COLAB] | [Demo Video]

Meta released a new foundation model for segmentation tasks. It aims to resolve downstream segmentation tasks with prompt engineering, such as foreground/background points, bounding box, mask, and free-formed text. However, the text prompt is not released yet.

Alternatively, I took the following steps:

  1. Get all object proposals generated by SAM (Segment Anything Model).
  2. Crop the object regions by bounding boxes.
  3. Get cropped images' features and a query feature from CLIP.
  4. Calculate the similarity between image features and the query feature.
# How to get the similarity.
preprocessed_img = preprocess(crop).unsqueeze(0)
tokens = clip.tokenize(texts)
logits_per_image, _ = model(preprocessed_img, tokens)
similarity = logits_per_image.softmax(-1)

How to run on local

Anaconda is required before start setup.

make env
conda activate segment-anything-with-clip
make setup
# this executes GRadio server.
make run

Open http://localhost:7860/

Successive Works

References