Awesome
<div align="center"> <h1>FGVP: Fine-Grained Visual Prompting</h1> </div>Official Codes for Fine-Grained Visual Prompting, NeurIPS 2023
Install
Our code is built upon ReClip. The installation instructions and the preparation of datasets are the same as the ReClip repository.
FGVP
<p align="center"> <img src="assets/fig_visual_prompt.png" width="100%"></p> A Summary of visual prompts with the caption "elephant on the left". <br>Results
Method | VLM | Visual Prompt | Post Processing | Command | RefCOCO val | RefCOCO+ val | RefCOCOg val |
---|---|---|---|---|---|---|---|
CPT-adapted | ViT-B/32, RN50x16 | $B2$ | R | link | 41.3 | 41.3 | 51.3 |
ReCLIP | ViT-B/32, RN50x16 | $P{\ | \ }B4$ | R | link | 45.8 | 47.9 | 59.3 |
RedCircle | ViT-B/32, RN50x16 | $P{\ | \ }C1$ | R | link | 43.9 | 45.3 | 57.3 |
FGVP (ours) | ViT-B/32, RN50x16 | $P{\ | \ }D4$ | R | link | 52.0 | 53.3 | 62.1 |
RedCircle (reported in paper) | ViT-L/14@336px, RN50x16 | $C1{\ | \ }C3{\ | \ }C4$ | S | -- | 49.8 | 55.3 | 59.4 |
RedCircle | ViT-L/14@336px, RN50x16 | $C1{\ | \ }C3{\ | \ }C4$ | S | link | 51.4 | 56.3 | 58.3 |
FGVP (ours) | ViT-L/14@336px, RN50x16 | $D1{\ | \ }D3{\ | \ }D4$ | S | link | 52.9 | 57.4 | 58.1 |
RedCircle | ViT-L/14@336px, RN50x16 | $P{\ | \ }C1{\ | \ }C3{\ | \ }C4$ | S | link | 51.6 | 58.1 | 60.0 |
FGVP (ours) | ViT-L/14@336px, RN50x16 | $P{\ | \ }D1{\ | \ }D3{\ | \ }D4$ | S | link | 53.9 | 59.3 | 61.0 |
RedCircle | ViT-L/14@336px, RN50x16 | $P{\ | \ }C1{\ | \ }C3{\ | \ }C4$ | RS | link | 56.8 | 58.6 | 62.2 |
FGVP (ours) | ViT-L/14@336px, RN50x16 | $P{\ | \ }D1{\ | \ }D3{\ | \ }D4$ | RS | link | 59.6 | 60.0 | 63.3 |
Inference Single Image
We simply offer an inference script for a single image without post-processing.
# example 1
python fgvp-reclip/simple_inference.py \
--img_dir demo/exp1/ori.png \
--text 'apple on the left' 'apple in the middle' 'broccoli' 'raspberries' 'grossum' 'glass bowl' \
--out_dir demo/exp1 \
--sam_prompt grid
# example 2
python fgvp-reclip/simple_inference.py \
--img_dir demo/exp2/ori.png \
--out_dir demo/exp2 \
--text 'photo on the wall' \
--sam_prompt grid
You can provide proposal boxes derived from other detectors to achieve better localization. Save your bounding boxes in a JSON file and specify it with --candidate_boxes
.
# example
python fgvp-reclip/simple_inference.py \
--img_dir demo/exp1/ori.png \
--text 'apple on the left' 'apple in the middle' 'broccoli' 'raspberries' 'grossum' 'glass bowl' \
--out_dir demo/exp1 \
--sam_prompt box \
--candidate_boxes demo/exp1/candidate_boxes.json