Home

Awesome

VCTree-Visual-Question-Answering

Code for the VQA part of CVPR 2019 oral paper: "Learning to Compose Dynamic Tree Structures for Visual Contexts", as to the Scene Graph Generation part of this paper, please refer to KaihuaTang/VCTree-Scene-Graph-Generation

UGLY CODE WARNING! UGLY CODE WARNING! UGLY CODE WARNING!

The code is directly modified from the project Cyanogenoid/vqa-counting. We mainly modified the model.py, train.py, config.py and add several files about our VCTree model, such as all tree_*.py, gen_tree_net.py. Before we got our final model, we tried lots of different tree structures, hence you may found some strange code such as config.gen_tree_mode and the corresponding choices in tree_feature.py. Just ignore them. (I'm too lazy to purge the code, sorry about that)

Dependencies

This code was confirmed to run with the following environment:

Prepare data

Please follow Instruction to prepare data.

python preprocess-images.py
python preprocess-vocab.py

This creates an h5py database (95 GiB) containing the object proposal features and a vocabulary for questions and answers at the locations specified in config.py.

Train your model

Note that the proposed hybird learning strategy needs to manually iteratively change the config.use_rl = False or True and use -resume to load the model from previous stage (which is quite stupid). So you can just first start with config.use_rl = False

The rest instruction is similar to original project Cyanogenoid/vqa-counting

python train.py [optional-name]

This will alternate between one epoch of training on the train split and one epoch of validation on the validation split while printing the current training progress to stdout and saving logs in the logs directory. The logs contain the name of the model, training statistics, contents of config.py, model weights, evaluation information (per-question answer and accuracy), and question and answer vocabularies.

python view-log.py <path to .pth log>
python eval-acc.py <path to .pth log> [<more paths to .pth logs> ...]

If you pass in multiple paths as arguments, this gives you standard deviations as well. To customise what categories are shown, you can modify the "accept conditions" for categories in eval-acc.py.

Sometime You Need To Know

If this paper/project inspires your work, pls cite our work:

@inproceedings{tang2018learning,
  title={Learning to Compose Dynamic Tree Structures for Visual Contexts},
  author={Tang, Kaihua and Zhang, Hanwang and Wu, Baoyuan and Luo, Wenhan and Liu, Wei},
  booktitle= "Conference on Computer Vision and Pattern Recognition",
  year={2019}
}