Awesome

Meta-Module-Network

Code for WACV 2021 Paper "Meta Module Network for Compositional Visual Reasoning"

Data Downloading

Download all the question files and scene graph files and bottom-up features from the web server, it can take up to 300G disk space.

  bash get_data.sh

This script will download questions/ folder, and the "trainval_all_programs.json" is used for bootstrapping and "trainval_unbiased_programs.json" is used for finetunning in the paper. The "trainval_unbiased_programs.json" and "testdev_pred_programs.json" are both generated by the program generator model.

Meta Module Network Implementation

To understand more detailed implementation of MMN, please refer to README.

Description of different files

sceneGraphs/trainval_bounding_box.json: the scene graph provided by the original GQA dataset

  {
    imageId:
    {
      bouding_box_id:
      {
        x: number,
        y: number,
        w: number,
        h: number,
        relations: [{object: "bounding_box_id", name: "relation_name"} ... ],
        name: object_class,
        attributes: [attr1, attr2, ... ]
      },
      bouding_box_id:
      {
        ...
      },
    }
  }

questions: the questions-program pairs and their associated images.

[
  [
    "ImageId",
    "Question",
    "Programs": [f1, f2, ..., fn],
    "QuestionId",
    "Answer"
  ]
]

Data Preprocessing [Optional]:

If you want to know how the programs and training data are generated, please follow the following steps:

Preprocessing Question-Program Pairs:

Download the questions from the original GQA website and then put it in the parent folder '../gqa-questions/', the following steps are aimed to convert "questions" into program format as follows:

preprocess the trainval_all_question into trainval_all_programs.json

  python preprocess.py trainval_all

preprocess the "balanced" programs into different forms:

  python preprocess.py create_balanced_programs

create the programs into the "input" forms for trainval_all_programs.json:

  python preprocess.py create_all_inputs

create the programs into the "input" forms for *balanced.json:

  python preprocess.py create_inputs

Using NL2Program Model to Predict Test-Dev Programs from input questions:

Train the sequence-2-sequence model:

  python generate_program.py --do_preprocess

Evaluate the NL2Program

  python generate_program.py --do_testdev

Prepare the generated programs for the modular transformer

  python generate_program.py --do_trainval_unbiased

Meta Module Network Training and Evaluation

Prepare the inputs for the modular transformer:
```
  python preprocess.py create_pred_inputs
```
Start the bootstrap training of the modular transoformer or you can download the pre-trained models directly from Google Drive. This bootstrap process could take quite a long time, please be patient if you are training on your own:
```
 python run_experiments.py --do_train_all --model TreeSparsePostv2 --id TreeSparsePost2Full --stacking 2 --batch_size 1024
```

Start the finetunning on the balanced split:

  python run_experiments.py --do_finetune --id FinetuneTreeSparseStack2RemovalFullValSeed6999 --model TreeSparsePostv2 --load_from models/TreeSparsePost2Full --seed 6999 --stacking 2

Test the model on the testdev split:

  python run_experiments.py --do_testdev_pred --id FinetuneTreeSparseStack2RemovalValSeed6777 --load_from [MODEL_NAME]  --model TreeSparsePostv2 --stacking 2

Citation

If you find this paper useful, please add the following reference to your paper.

  @article{chen2019meta,
  title={Meta module network for compositional visual reasoning},
  author={Chen, Wenhu and Gan, Zhe and Li, Linjie and Cheng, Yu and Wang, William and Liu, Jingjing},
  journal={Proceedings of WACV},
  year={2021}
}