Home

Awesome

What is Where by Looking: Weakly-Supervised Open-World Phrase-Grounding without Text Inputs - NIPS 22'

<p align="center"> <img src="pics/pic1.PNG" width="800"> </p> <p align="center"> <img src="pics/pic2.PNG" width="800"> </p>

Get Started

For training a model :

python train_grounding.py -bs 32 -nW 8 -nW_eval 1 -task vg_train -data_path /path_to/vg -val_path /path_to/flicker
python train_grounding.py -bs 32 -nW 8 -nW_eval 1 -task coco -data_path /path_to/coco -val_path /path_to/flicker

For Grounding evaluation with our model [XX is the number of the results folder i.e 'gpu22' - XX == 22]:

python inference_grounding.py -task grounding -dataset refit -val_path /path_to/RefIt -Isize 224 -clip_eval 0 -path_ae XX -nW 1
python inference_grounding.py -task grounding -dataset flicker -val_path /path_to/flicker -Isize 224 -clip_eval 0 -path_ae XX -nW 1
python inference_grounding.py -task grounding -dataset vg -val_path /path_to/VG -Isize 224 -clip_eval 0 -path_ae XX -nW 1

For Grounding evaluation with CLIP model:

python inference_grounding.py -task grounding -dataset refit -val_path /path_to/RefIt -Isize 224 -clip_eval 1 -nW 1
python inference_grounding.py -task grounding -dataset flicker -val_path /path_to/flicker -Isize 224 -clip_eval 1 -nW 1
python inference_grounding.py -task grounding -dataset vg -val_path /path_to/VG -Isize 224 -clip_eval 1 -nW 1

For WWbL evaluation with our model:

python inference_grounding.py -task app -dataset refit -val_path /path_to/RefIt -Isize 224 -clip_eval 0 -path_ae XX -nW 1 --start 0 --end 9983
python wwbl_algo1_point_metric.py -nW 1 -predictions_path YY -val_path /path_to/RefIt --dataset refit

python inference_grounding.py -task app -dataset flicker -val_path /path_to/flicker -Isize 224 -clip_eval 0 -path_ae XX -nW 1 -start 0 -end 1000
python wwbl_algo1_point_metric.py -nW 1 -predictions_path YY -val_path /path_to/flicker --dataset flicker

python inference_grounding.py -task app -dataset vg -val_path /path_to/VG -Isize 224 -clip_eval 0 -path_ae XX -nW 1 -start 0 -end 17478
python wwbl_algo1_point_metric.py -nW 1 -predictions_path YY -val_path /path_to/VG --dataset VG
<p align="center"> <img src="pics/pic3.PNG" width="800"> </p>

Phrase Grounding Results - Point Accuracy Metric

COCO weights

VG weights

MethodBackboneVG(VGtrained/COCO)Flicker(VGtrained/COCO)ReferIt(VGtrained/COCO)
BaselineRandom11.1527.2424.30
BaselineCenter20.5547.4030.30
GAECLIP54.7272.4756.76
FCVCVGG-/14.03-/29.03-/33.52
VGLSVGG24.40/--/--/-
TDInception-219.31/-42.40/-31.97/-
SSSVGG30.03/-49.10/-39.98/-
MGBiLSTM+VGG50.18/46.9957.91/53.2962.76/47.89
MGELMo+VGG48.76/47.9460.08/61.6660.01/47.52
GbSVGG53.40/52.0070.48/72.6059.44/56.10
oursCLIP+VGG62.31/59.0975.63/75.4365.95/61.03
<p align="center"> <img src="pics/pic4.PNG" width="800"> </p>