Awesome

Multimodal Explanations: Justifying Decisions and Pointing to the Evidence

This repository contains the code for the following paper:

DH. Park, LA. Hendricks, Z. Akata, A. Rohrbach, B. Schiele, T. Darrell, M. Rohrbach, Multimodal Explanations: Justifying Decisions and Pointing to the Evidence. in CVPR, 2018. (PDF)

Installation

Install Python 3.
Install Caffe.

Compile the feature/20160617_cb_softattention branch of our fork of Caffe. This branch contains Yang Gao’s Compact Bilinear layer, Signed SquareRoot layer, and L2 Normalization Layer (dedicated repo, paper) released under the BDD license, and Ronghang Hu’s Soft Attention layers (paper) released under BSD 2-clause.

Download this repository or clone with Git, and then enter the root directory of the repository: git clone https://github.com/Seth-Park/MultimodalExplanations.git && cd MultimodalExplanations

Datasets Download & Preprocess

VQA-X

Download and setup the VQA v2.0 dataset by following the instructions and directory structure described here.
Download the VQA-X dataset from this google drive link.
After unzipping the datasets, symlink the VQA directory setup in step 1 as MultimodalExplanation/PJ-X-VQA/VQA-X and place the data accordingly so that the file sructure looks as the following:

MultimodalExplanation/PJ-X-VQA/VQA-X/
    Annotations/
        train_exp_anno.json
        val_exp_anno.json
        test_exp_anno.json
        v2_mscoco_train2014_annotations.json
        v2_mscoco_val2014_annotations.json
        visual/
            val/
            test/
    Images/
        train2014
        val2014
    Questions/
        v2_OpenEnded_mscoco_train2014_questions.json
        v2_OpenEnded_mscoco_val2014_questions.json
    Features/  # this will be the directory to which we extract visual features!
    ...

We use ResNet-152 model to extract the visual features. Download the ResNet-152 caffemodel from here.
Modify the config.py file in the proprocess directory to specify source path and destination path (should be PJ-X-VQA/VQA-X/Features). Then, start extracting the visual features.

cd preprocess
# fix config.py file
python extract_resnet.py
cd ..

ACT-X

Download images for MPII Human Pose Dataset, Version 1.0 from here.
Download the ACT-X dataset from this google drive link.
After unzipping the datasets, symlink it to PJ-X-ACT/ACT-X and place the data accordingly so that the file sructure looks as the following:

MultimodalExplanation/PJ-X-ACT/ACT-X
    textual/
        exp_train_split.json
        exp_val_split.json
        exp_test_split.json
    visual/
        val/
        test/
    Features/  # this will be the directory to which we extract visual features!
    ...

We use ResNet-152 model to extract the visual features. Download the ResNet-152 caffemodel from here.
Modify the config.py file in the proprocess directory to specify source path and destination path (should be PJ-X-ACT/ACT-X/Features). Then, start extracting the visual features.

cd preprocess
# fix config.py file
python extract_resnet.py
cd ..

Training

VQA-X

We use pretrained VQA model (using VQA training set) for the explanation task. Download the pretrained VQA caffemodel from here.
In the PJ-X-VQA/model directory, you will see vdict.json and adict.json. These are json files for the vocabulary and answer candidates used in our pretrained VQA model. Loading the json files will give you python dictionaries that map a word/answer to index. It is important to use these key-value mappings when using our pretrained VQA model. If training from scratch or using a different model, you will have to provide your own vdict.json and adict.json.
Modify the config.py file in PJ-X-VQA directory (i.e. set the path to where the pretrained VQA caffemodel is) and then start training:

cd PJ-X-VQA
# fix config.py file
python train.py

ACT-X

For activity classification we do not use a pretrained network, so vdict.json and adict.json will be automatically generated in PJ-X-ACT/model directory.
Modify the config.py file in PJ-X-ACT directory and then start training:

cd PJ-X-ACT
# fix config.py file
python train.py

Generating Explanations

Pretrained Models

The pretrained PJ-X models and the explanations generated by them can be downloaded here. When using the pretrained models, make sure to place all the files in the correct locations (i.e. PJ-X-VQA/model or PJ-X-ACT/model).

VQA-X

The model prototxt, answer dictionary, vocab dictionary, and explanation dictionary will all be stored in PJ-X-VQA/model directory after training.
Provide this directory as input to the following command:

cd PJ-X-VQA/generate_vqa_exp
python generate_explanation.py --ques_file ../VQA-X/Questions/v2_OpenEnded_mscoco_val2014_questions.json --ann_file ../VQA-X/Annotations/v2_mscoco_val2014_annotations.json --exp_file ../VQA-X/Annotations/val_exp_anno.json --gpu 0 --out_dir ../VQA-X/results --folder ../model/ --model_path $PATH_TO_CAFFEMODEL --use_gt --save_att_map

The command will save generated textual and visual explanations in the directory designated by --our_dir .

ACT-X

Similar to VQA-X, run this command to start generating explanations:

cd PJ-X-ACT/generate_act_exp
python generate_explanation.py --ann_file ../ACT-X/textual/exp_val_split.json --gpu 0 --out_dir ../ACT-X/results --folder ../model --model_path $PATH_TO_CAFFEMODEL --use_gt --save_att_map

The command will save generated textual and visual explanations in the directory designated by --our_dir .