Home

Awesome

VizWiz Challenge: Visual Question Answering Implementation in PyTorch

PyTorch VQA implementation that achieved top performances in the (ECCV18) VizWiz Grand Challenge: Answering Visual Questions from Blind People. The code can be easily adapted for training on VQA 1.0/2.0 or any other dataset.

The implemented architecture is a variant of the VQA model described in Kazemi et al. (2017). Show, Ask, Attend, and Answer: A Strong Baseline For Visual Question Answering. Visual feature are extracted using a pretrained (on ImageNet) ResNet-152. Input Questions are tokenized, embedded and encoded with an LSTM. Image features and encoded questions are combined and used to compute multiple attention maps over image features. The attended image features and the encoded questions are concatenated and finally fed to a 2-layer classifier that outputs probabilities over the answers (classes).

More information about the attention module can be found in Yang et al. (2015). Stacked Attention Networks for Image Question Answering.

In order to consider all 10 answers given by the annotators we exploit a Soft Cross-Entropy loss : a weighted average of the negative log-probabilities of each unique ground-truth answer. This loss function better aligns to the VQA evaluation metric used to evaluate the challenge submissions.

Soft cross-entropy loss

Experimental Results

methodaccuracy
VizWiz Paper0.475
Ours0.516

Training and Evaluation

conda create --name viz_env python=3.6
source activate viz_env
pip install -r requirements.txt
wget https://ivc.ischool.utexas.edu/VizWiz/data/VizWiz_data_ver1.tar.gz
tar -xzf VizWiz_data_ver1.tar.gz

After unpacking the dataset, the Image folder will contain files with prefix ._VizWiz. Those files should be removed before extracting the image features:

rm ._*
python ./preprocessing/image_features_extraction.py
python ./preprocessing/create_vocabs.py
python train.py

During training, the models with the highest validation accuracy and with the lowest validation loss are saved. The path of the log directory is specified in the yaml configuration file config/default.yaml.

python predict.py

Acknowledgment