Home

Awesome

Invertible Question Answering Network (iQAN)

This is the PyTorch implementation of our Invertible Question Answering Network (iQAN) proposed in Visaul Question Generation as Dual Task of Visual Question Answering. Please follow the guidelines to run to the codes.

p.s. This is just an initial version. More details will available soon.

Introduction

Both Visual Question Generation (VQG) and Visual Question Answering (VQA) tasks are training models in a end-to-end fashion on a multimodal dataset made of triplets:

<p align="center"> <img src="./doc/vqa_vqg.png" width="400"/> </p>

As you can see in the illustration bellow, the problem solving schemes of VQA (top) and VQG (bottom) both utilize the encoder-fusion-decoder pipeline with Q and A in inverse order:

<p align="center"> <img src="./doc/cvpr2018_iqan_pipeline.jpg" width="1960"/> </p>

One of our claim is that VQA and VQG are two complementary tasks with isomorphoic settings, where the two could be viewed as the inverse form of each other. Thus, we formulate them as dual tasks and propose a dual training scheme, Invertible Question Answering Network (iQAN), to jointly train the model on VQA and VQG tasks. Our experimental results show that our proposed dual training scheme could make better use the annotated data by simultaneously training the model on two dual tasks.

Summary:

Installation

Requirements

First install python 3 (we don't provide support for python 2). We advise you to install python 3 and pytorch with Anaconda:

conda create --name vqa python=3
source activate vqa
conda install pytorch torchvision cuda80 -c soumith

Then clone the repo (with the --recursive flag for submodules) and install the complementary requirements:

cd $HOME
git clone --recursive https://github.com/yikang-li/iQAN.git
cd iQAN
pip install -r requirements.txt

Submodules

Our code has two external dependencies:

Data

VQA 2.0 Data will be automaticaly downloaded and preprocessed when needed.

For CLEVR dataset:

wget https://s3-us-west-1.amazonaws.com/clevr/CLEVR_v1.0.zip
unzip CLEVR_v1.0
mkdir -p data/clevr/annotations
cd data/clevr/annotations
ln -s /path/to/CLEVR_v1.0/questions/CLEVR_train_questions.json train.json
ln -s /path/to/CLEVR_v1.0/questions/CLEVR_val_questions.json val.json
cd ../
mkdir ln -s /path/to/CLEVR_v1.0/images raw

Reproducing results on VQA 2.0 and CLEVR

Prepare Features

From COCO

The needed images will be automaticaly downloaded to dir_data and the features will be extracted with a resnet152 by default.

There are three options for mode :

Beware, you will need some space on your SSD:

python extract.py -h
python extract.py --dir_data data/coco --data_split train
python extract.py --dir_data data/coco --data_split val

Note: By default our code will share computations over all available GPUs. If you want to select only one or a few, use the following prefix:

CUDA_VISIBLE_DEVICES=0 python extract.py
CUDA_VISIBLE_DEVICES=1,2 python extract.py

From CLEVR

python extract.py -h
python extract.py --dir_data data/clevr --data_split train --dataset clevr
python extract.py --dir_data data/clevr --data_split val --dataset clevr

Dual Training

We apply our proposed Dual Training scheme on three models:

All the corresponding model options is listed in options/dual_model/. The model options for CLEVR dataset is in the subfolder CLEVR.

Options

We have several options to enable/disable different settings:

Train models on VQA 2.0

Training the Mutan VQA with Dual Training scheme:

CUDA_VISIBLE_DEVICES=0,1 python train_dual_model.py --path_opt options/dual_model_MUTAN_skipthought.yaml --dual_training --share_embeddings

You can set the share_modules to False to train a baseline Mutan VQA model.

CUDA_VISIBLE_DEVICES=0,1 python train_dual_model.py --path_opt options/dual_model_MUTAN_skipthought.yaml

Training the MLB VQA with Dual Training scheme:

CUDA_VISIBLE_DEVICES=0,1 python train_dual_model.py --path_opt options/dual_model_MLB.yaml --dual_training --share_embeddings

Training the iBOWIMG VQA with Dual Training scheme:

CUDA_VISIBLE_DEVICES=0,1 python train_dual_model.py --path_opt options/dual_model_iBOWIMG.yaml --dual_training --share_embeddings

Train models on CLEVR

Training the Mutan VQA with Dual Training scheme:

CUDA_VISIBLE_DEVICES=0,1 python train_dual_model.py --path_opt options/CLEVR/dual_model_MUTAN_skipthought.yaml --dual_training --share_embeddings

Restart training

Restart the model from the last checkpoint.

python train.py --path_opt options/dual_model_MUTAN_skipthought.yaml --dir_logs logs/dual_model/iQAN_Mutan_skipthought --resume ckpt

Restart the model from the best checkpoint.

python train.py --path_opt options/dual_model_MUTAN_skipthought.yaml --dir_logs logs/dual_model/iQAN_Mutan_skipthought --resume best

Evaluate models on VQA

Evaluate the model from the best checkpoint.

python train.py --path_opt options/dual_model_MUTAN_skipthought.yaml --dir_logs logs/dual_model/iQAN_Mutan_skipthought --resume best -e

Qualitative Results:

You can visualize the evalutation results with visualize_results.ipynb and visualize_results_CLEVR.ipynb on VQA 2.0 and CLRVR dataset respectively.

<p align="center"> <img src="./doc/cvpr2018_iqan_results.jpg" width="1856"/> </p>

VQG as a way to augment Q/A pairs

Training with partial data

Here, we provide an additional option --partial to training the model with only part of the original dataset. If we set partial within the range (0, 1), it will use corresponding part of original training data. But the full validation dataset is still used for evaluation.

CUDA_VISIBLE_DEVICES=0,1 python train_dual_model.py --path_opt options/dual_model_MUTAN_skipthought.yaml --dual_training --share_embeddings --partial 0.5 --dir_logs logs/dual_model/iQAN_Mutan_skipthought_partial_0_5

Generate questions based on answers

Here, we should modify the code. Change evaluation data_loader to train_loader, from engine.evaluate(test_loader, model, exp_logger, args.print_freq) to engine.evaluate(train_loader, model, exp_logger, args.print_freq). Then run the evaluation command:

python train.py --path_opt options/dual_model_MUTAN_skipthought.yaml --dir_logs logs/dual_model/iQAN_Mutan_skipthought_partial_0_5 --resume best -e

Re-package an augmented dataset

We provide the scripts augment_data.ipynb to generate a new dataset with the latter part of the annotations changed to the augmented ones. start_index should correspond to the --partial

Then replace the trainset.pickle at ./data/vqa2/processed/nans,2000_maxlength,17_minwcount,10_nlp,nltk_pad,right_trainsplit,train_filter_questions/ with newly generated augmented_trainset.pickle.

Pretrain

We pretrain the model on the combined annotated half and augmented half.

CUDA_VISIBLE_DEVICES=0,1 python train_dual_model.py --path_opt options/dual_model_MUTAN_skipthought.yaml --dual_training --share_embeddings --dir_logs logs/dual_model/iQAN_Mutan_skipthought_augmented_pretrain

Finetuning

Then finetune the model with the clean (annotated) part, don't forget the enlarge the epochs enough for training:

CUDA_VISIBLE_DEVICES=0,1 python train_dual_model.py --path_opt options/dual_model_MUTAN_skipthought.yaml --dual_training --share_embeddings --partial 0.5 --dir_logs logs/dual_model/iQAN_Mutan_skipthought_augmented_pretrain --resume best

Results

This is the copy of the result table listed in the paper.

Result on cleansed dataset.

ModelTraining SetAcc@1CiDErTraining SetAcc@1CiDEr
Baseline0.1 Q,A33.601.3320.5 Q,A46.681.930
DT0.1 Q,A35.231.5400.5 Q,A47.632.101
VQG+DT0.1 Q,A + 0.9 A38.871.5280.5 Q,A + 0.5 A47.992.072
VQG+DT+FT0.1 Q,A + 0.9 A39.951.7390.5 Q,A + 0.5 A48.482.281

Citation

Please cite the arXiv paper if you use Mutan in your work:

@article{li2018iqan,
  author={Li, Yikang and
    Duan, Nan and
    Zhou, Bolei and
    Chu, Xiao and
    Ouyang, Wanli and
    Wang, Xiaogang and
    Zhou, Ming},
  title={Visual Question Generation as Dual Task of Visual Question Answering},
  journal={CVPR},
  year = {2018},
  url = {http://cvboy.com/publication/cvpr2018_iqan/}
}

Acknowledgment

Special thanks to the authors of MUTAN for providing some PyTroch, and Multimedia Lab of CUHK as well as Microsoft Research Asia for the computing resources and perfect working atmosphere.