Awesome

Find Someone Who: Visual Commonsense Understanding in Human-Centric Grounding

Haoxuan You, Rui Sun*, Zhecan Wang*, Kai-Wei Chang, Shih-Fu Chang

[*: equal contribution]

Paper

fig

Data:

Please download annotation data from train/validation/test.

Please also prepare the VCR image data/metadata because our annotations reuse them.

Here is a detailed explanation of different items in each data sample.

annot_id: Annotation id of the dataset
objects: Annotated objects (persons only)
boxes: box location of objects (x1,x2,y1,y2,s)  
img_fn: Image filename in VCR's raw data.
metadata_fn: Metadata filename in VCR's raw data.
statement: Commonsense description for the persons. If its element is a list of a number, it refers to a person, and the number in list is the index in objects and boxes. 
original_vcr_annot_id: Original annotation id in VCR