Home

Awesome

Compositional Substitutivity of Visual Reasoning for Visual Question Answering

Implementation for the ECCV 2024 paper "Compositional Substitutivity of Visual Reasoning for Visual Question Answering" [paper link]

Example Image

<br> <br>

Dataset Download

Example Image

GQA-SPS Dataset

DownLoad Link: [Google Drive] [Baidu NetDisk (password: DSPS)]

Format:

  1. "gqa-sps-balanced-X-val-Y.json" is the question json for the val-Y split of X SPS, where X $\in$ {word, visual entity, referent}, and Y $\in$ {A, B}.
  2. "images_for_visual_enity_sps.zip" contains the images for "gqa-sps-balanced-visual-entity-val-A&B.json", for each image, "image_id.jpg" is used for model input, and "image_id_hl.jpg" high lights the substituted objects.

VQA-SPS v2 Dataset

DownLoad Link: [Google Drive] [Baidu NetDisk (password: DSPS)]

Format:

  1. "vqav2-sps-questions-X-val-Y.json" is the question json for the val-Y split of X SPS, where X $\in$ {word, visual entity, referent}, and Y $\in$ {A, B}.
  2. "vqav2-sps-annotations-X-val-Y.json" is the annotation json for the val-Y split of X SPS, where X $\in$ {word, visual entity, referent}, and Y $\in$ {A, B}.
  3. "images_for_visual_enity_sps.zip" contains the images for "vqav2-sps-questions-visual-entity-val-A&B.json", for each image, "image_id.jpg" is used for model input, and "image_id_hl.jpg" high lights the substituted objects.