Awesome
Find Someone Who: Visual Commonsense Understanding in Human-Centric Grounding
Haoxuan You, Rui Sun*, Zhecan Wang*, Kai-Wei Chang, Shih-Fu Chang
[*: equal contribution]
Data:
Please download annotation data from train/validation/test.
Please also prepare the VCR image data/metadata because our annotations reuse them.
Here is a detailed explanation of different items in each data sample.
annot_id: Annotation id of the dataset
objects: Annotated objects (persons only)
boxes: box location of objects (x1,x2,y1,y2,s)
img_fn: Image filename in VCR's raw data.
metadata_fn: Metadata filename in VCR's raw data.
statement: Commonsense description for the persons. If its element is a list of a number, it refers to a person, and the number in list is the index in objects and boxes.
original_vcr_annot_id: Original annotation id in VCR