Home

Awesome

<div align="center"> <h1>FGVP: Fine-Grained Visual Prompting</h1> </div>

Official Codes for Fine-Grained Visual Prompting, NeurIPS 2023

Install

Our code is built upon ReClip. The installation instructions and the preparation of datasets are the same as the ReClip repository.

FGVP

<p align="center"> <img src="assets/fig_visual_prompt.png" width="100%"></p> A Summary of visual prompts with the caption "elephant on the left". <br>

Results

MethodVLMVisual PromptPost ProcessingCommandRefCOCO valRefCOCO+ valRefCOCOg val
CPT-adaptedViT-B/32, RN50x16$B2$Rlink41.341.351.3
ReCLIPViT-B/32, RN50x16$P{\ | \ }B4$Rlink45.847.959.3
RedCircleViT-B/32, RN50x16$P{\ | \ }C1$Rlink43.945.357.3
FGVP (ours)ViT-B/32, RN50x16$P{\ | \ }D4$Rlink52.053.362.1
RedCircle (reported in paper)ViT-L/14@336px, RN50x16$C1{\ | \ }C3{\ | \ }C4$S--49.855.359.4
RedCircleViT-L/14@336px, RN50x16$C1{\ | \ }C3{\ | \ }C4$Slink51.456.358.3
FGVP (ours)ViT-L/14@336px, RN50x16$D1{\ | \ }D3{\ | \ }D4$Slink52.957.458.1
RedCircleViT-L/14@336px, RN50x16$P{\ | \ }C1{\ | \ }C3{\ | \ }C4$Slink51.658.160.0
FGVP (ours)ViT-L/14@336px, RN50x16$P{\ | \ }D1{\ | \ }D3{\ | \ }D4$Slink53.959.361.0
RedCircleViT-L/14@336px, RN50x16$P{\ | \ }C1{\ | \ }C3{\ | \ }C4$RSlink56.858.662.2
FGVP (ours)ViT-L/14@336px, RN50x16$P{\ | \ }D1{\ | \ }D3{\ | \ }D4$RSlink59.660.063.3

Inference Single Image

We simply offer an inference script for a single image without post-processing.

# example 1
python fgvp-reclip/simple_inference.py \
    --img_dir demo/exp1/ori.png \
    --text 'apple on the left' 'apple in the middle' 'broccoli' 'raspberries' 'grossum' 'glass bowl' \
    --out_dir demo/exp1 \
    --sam_prompt grid

# example 2
python fgvp-reclip/simple_inference.py \
    --img_dir demo/exp2/ori.png \
    --out_dir demo/exp2 \
    --text 'photo on the wall' \
    --sam_prompt grid

You can provide proposal boxes derived from other detectors to achieve better localization. Save your bounding boxes in a JSON file and specify it with --candidate_boxes.

# example
python fgvp-reclip/simple_inference.py \
    --img_dir demo/exp1/ori.png \
    --text 'apple on the left' 'apple in the middle' 'broccoli' 'raspberries' 'grossum' 'glass bowl' \
    --out_dir demo/exp1 \
    --sam_prompt box \
    --candidate_boxes demo/exp1/candidate_boxes.json