

Separating Skills and Concepts for Novel VQA

This repository contains the PyTorch code for the CVPR 2021 paper: Separating Skills and Concepts for Novel Visual Question Answering.



If you find this repository useful in your research, please consider citing:

  author = {Whitehead, Spencer and Wu, Hui and Ji, Heng and Feris, Rogerio and Saenko, Kate},
  title = {Separating Skills and Concepts for Novel Visual Question Answering},
  booktitle = {Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
  pages = {5632--5641},
  year = {2021}



Data Download and Organization

To setup the visual features, question files, and annotation files, please refer to the 'Setup' portion of the MCAN repository (under Prerequisites). Follow this procedure exactly, until the datasets directory has the structure shown in their repository.

Concept and Reference Set Preprocessing

The scripts for running the concept discovery and reference set preprocessing yourself will be added to this repository. For the time being, we provide preprocessed files that contain concepts, skill labels (if applicable), and reference sets for each question:

You should decompress the zip file and place the JSON files in the datasets/vqa directory:

|-- datasets
    |-- coco_extract
    |   |-- ...
    |-- vqa
    |   |-- train2014_scr_questions.json
    |   |-- train2014_scr_annotations.json
    |   |-- val2014_sc_questions.json
    |   |-- val2014_sc_annotations.json
    |   |-- ...


The base of the command to run the training is:

python run.py --RUN train ...

Some pertinent arguments to add are:

During training, the lastest model checkpoints are saved to ckpts/ckpt_<VERSION>/last_epoch.pkl and the training logs are saved to results/log/log_run_<VERSION>.txt. Validation predictions after every epoch will be saved in the results/cache/ directory. Additionally, accuracies on novel compositions (or novel concepts) are also evaluated after each epoch.

Evaluating Novel Compositions/Concepts

While performance on novel compositions/concepts are evaluated after every epoch, they can also be evaluated separately.

Given a file containing the model predictions on the val2014 data (in the VQA v2 evaluation format), run the following to get results on the novel compositions/concepts:


where --CONCEPT and --SKILL should be the same as the held out compositions/concepts from training (i.e., exact same arguments). If both, --CONCEPT and --SKILL are supplied, then that novel skill-concept composition is evaluated. If only, --CONCEPT is supplied, then that novel concept is evaluated.

To obtain a file with model predictions, run:

python run.py --RUN val --CKPT_PATH <PATH_TO_MODEL_CKPT>


This repository is adapted from the MCAN repository. We thank the authors for providing their code.