Home

Awesome

Language-Conditioned Graph Networks for Relational Reasoning

This repository contains the code for the following paper:

@inproceedings{hu2019language,
  title={Language-Conditioned Graph Networks for Relational Reasoning},
  author={Hu, Ronghang and Rohrbach, Anna and Darrell, Trevor and Saenko, Kate},
  booktitle={Proceedings of the IEEE International Conference on Computer Vision (ICCV)},
  year={2019}
}

Project Page: http://ronghanghu.com/lcgn

This is the (original) TensorFlow implementation of LCGN. A PyTorch implementation is available in the PyTorch branch.

Installation

  1. Install Python 3 (Anaconda recommended: https://www.continuum.io/downloads).
  2. Install TensorFlow (we used TensorFlow 1.12.0 in our experiments):
    pip install tensorflow-gpu (or pip install tensorflow-gpu==1.12.0 to install TensorFlow 1.12.0)
  3. Install PyTorch (needed only in CLEVR and CLEVR-Ref+ experiments, and only for feature extraction):
    pip install torch torchvision
  4. Install a few other dependency packages (NumPy, HDF5, YAML):
    pip install numpy h5py pyyaml
  5. Download this repository or clone with Git, and then enter the root directory of the repository:
    git clone https://github.com/ronghanghu/lcgn.git && cd lcgn

Train and evaluate on the CLEVR dataset and the CLEVR-Ref+ dataset

The LCGN model is applied to the 14 x 14 x 1024 ResNet-101-C4 features from CLEVR and CLEVR-Ref+.

On CLEVR, we train on the training split and evaluate on the validation and test splits. It gets the following performance on the validation (val) and the test (test) split of the CLEVR:

Accuracy on valAccuracy on testPre-trained model
97.90%97.88%download

On CLEVR-Ref+, following the IEP-Ref code, we cross-valid on the training set and evaluate on the validation set. It gets the following performance on the validation (locplus_val in our code) split of the CLEVR-Ref+:

Accuracy on locplus_valPre-trained model
74.82%download

Download and preprocess the data

  1. Download the CLEVR dataset from http://cs.stanford.edu/people/jcjohns/clevr/, and symbol link it to exp_clevr/clevr_dataset. After this step, the file structure should look like
exp_clevr/clevr_dataset/
  images/
    train/
      CLEVR_train_000000.png
      ...
    val/
    test/
  questions/
    CLEVR_train_questions.json
    CLEVR_val_questions.json
    CLEVR_test_questions.json
  ...

If you want to run any experiments on the CLEVR-Ref+ dataset for the referential expression comprehension task, you can download it from here, and symbol link it to exp_clevr/clevr_locplus_dataset. After this step, the file structure should look like

exp_clevr/clevr_locplus_dataset/
  images/
    train/
      CLEVR_train_000000.png
      ...
    val/
    test/
  refexps/
    clevr_ref+_train_refexps.json
    clevr_ref+_val_refexps.json
  scenes/
    clevr_ref+_train_scenes.json
    clevr_ref+_val_scenes.json
  1. Build imdbs for the datasets
cd exp_clevr/data/
# build imdbs
python build_clevr_imdb.py
python build_clevr_locplus_imdb.py   # only needed if you want to run on CLEVR-Ref+
cd ../../
  1. Extract ResNet-101-C4 features for the CLEVR images

Here, we use Justin Johnson's CLEVR feature extraction script, which requires PyTorch.

mkdir -p exp_clevr/data/features/spatial/
python exp_clevr/data/extract_features.py \
  --input_image_dir exp_clevr/clevr_dataset/images/train \
  --output_h5_file exp_clevr/data/features/spatial/train.h5
python exp_clevr/data/extract_features.py \
  --input_image_dir exp_clevr/clevr_dataset/images/val \
  --output_h5_file exp_clevr/data/features/spatial/val.h5
python exp_clevr/data/extract_features.py \
  --input_image_dir exp_clevr/clevr_dataset/images/test \
  --output_h5_file exp_clevr/data/features/spatial/test.h5

Training on CLEVR and CLEVR-Ref+

Note:

Pretrained models:

Training steps:

  1. Add the root of this repository to PYTHONPATH: export PYTHONPATH=.:$PYTHONPATH
  2. Train on CLEVR (for VQA):
    python exp_clevr/main.py --cfg exp_clevr/cfgs/lcgn.yaml train True
  3. Train on CLEVR-Ref+ (for REF):
    python exp_clevr/main_ref.py --cfg exp_clevr/cfgs_ref/lcgn_ref.yaml train True

Testing on CLEVR and CLEVR-Ref+

Note:

Testing steps:

  1. Add the root of this repository to PYTHONPATH: export PYTHONPATH=.:$PYTHONPATH
  2. Test on CLEVR (for VQA):
    • test locally on the val split:
      python exp_clevr/main.py --cfg exp_clevr/cfgs/lcgn.yaml
    • test locally on the test split (the displayed accuracy will be zero; use prediction outputs in exp_clevr/results/lcgn/):
      python exp_clevr/main.py --cfg exp_clevr/cfgs/lcgn.yaml TEST.SPLIT_VQA test TEST.DUMP_PRED True
  3. Test on CLEVR-Ref+ (for REF):
    • test locally on the val split:
      python exp_clevr/main_ref.py --cfg exp_clevr/cfgs_ref/lcgn_ref.yaml

Train and evaluate on the GQA dataset

The LCGN model is applicable to three types of features from GQA:

It gets the following performance on the validation (val_balanced), the test-dev (testdev_balanced) and the test (test_balanced) split of the GQA dataset:

Visual Feature TypeAccuracy on val_balancedAccuracy on testdev_balancedAccuracy on test_balanced (obtained on EvalAI Phase: test2019)Pre-trained model
spatial features55.29%49.47%49.21%download
objects features63.87%55.84%56.09%download
"perfect-sight" object names and attributes90.23%n/a*n/a*download

*This setting requires using the GQA ground-truth scene graphs at both training and test time (only the object names and attributes are used; their relations are not used). Hence, it is not applicable to the test or the challenge setting.

Note: we also release our simple but well-performing "single-hop" baseline for the GQA dataset in a standalone repo. This "single-hop" model can serve as a basis for developing more complicated models.

Download the GQA dataset

Download the GQA dataset from https://cs.stanford.edu/people/dorarad/gqa/, and symbol link it to exp_gqa/gqa_dataset. After this step, the file structure should look like

exp_gqa/gqa_dataset
    questions/
        train_all_questions/
            train_all_questions_0.json
            ...
            train_all_questions_9.json
        train_balanced_questions.json
        val_all_questions.json
        val_balanced_questions.json
        submission_all_questions.json
        test_all_questions.json
        test_balanced_questions.json
    spatial/
        gqa_spatial_info.json
        gqa_spatial_0.h5
        ...
        gqa_spatial_15.h5
    objects/
        gqa_objects_info.json
        gqa_objects_0.h5
        ...
        gqa_objects_15.h5
    sceneGraphs/
        train_sceneGraphs.json
        val_sceneGraphs.json
    images/
        ...

Note that on GQA images are not needed for training or evaluation -- only questions, features and scene graphs (if you would like to run on the "perfect-sight" object names and attributes) are needed.

Training on GQA

Note:

Pretrained models:

Training steps:

  1. Add the root of this repository to PYTHONPATH: export PYTHONPATH=.:$PYTHONPATH
  2. Train with spatial features (convolutional grid features):
    python exp_gqa/main.py --cfg exp_gqa/cfgs/lcgn_spatial.yaml train True
  3. Train with objects features (from detection):
    python exp_gqa/main.py --cfg exp_gqa/cfgs/lcgn_objects.yaml train True
  4. Train with "perfect-sight" object names and attributes (one-hot embeddings):
    python exp_gqa/main.py --cfg exp_gqa/cfgs/lcgn_scene_graph.yaml train True

Testing on GQA

Note:

Testing steps:

  1. Add the root of this repository to PYTHONPATH: export PYTHONPATH=.:$PYTHONPATH
  2. Test with spatial features:
    • test locally on the val_balanced split:
      python exp_gqa/main.py --cfg exp_gqa/cfgs/lcgn_spatial.yaml TEST.SPLIT_VQA val_balanced
    • test locally on the testdev_balanced split:
      python exp_gqa/main.py --cfg exp_gqa/cfgs/lcgn_spatial.yaml TEST.SPLIT_VQA testdev_balanced
    • generate the submission file on submission_all for EvalAI (the displayed accuracy will be zero; this takes a long time):
      python exp_gqa/main.py --cfg exp_gqa/cfgs/lcgn_spatial.yaml TEST.SPLIT_VQA submission_all TEST.DUMP_PRED True
  3. Test with objects features:
    • test locally on the val_balanced split:
      python exp_gqa/main.py --cfg exp_gqa/cfgs/lcgn_objects.yaml TEST.SPLIT_VQA val_balanced
    • test locally on the testdev_balanced split:
      python exp_gqa/main.py --cfg exp_gqa/cfgs/lcgn_objects.yaml TEST.SPLIT_VQA testdev_balanced
    • generate the submission file on submission_all for EvalAI (the displayed accuracy will be zero; this takes a long time):
      python exp_gqa/main.py --cfg exp_gqa/cfgs/lcgn_objects.yaml TEST.SPLIT_VQA submission_all TEST.DUMP_PRED True
  4. Test with "perfect-sight" object names and attributes (one-hot embeddings):
    • test locally on the val_balanced split:
      python exp_gqa/main.py --cfg exp_gqa/cfgs/lcgn_scene_graph.yaml TEST.SPLIT_VQA val_balanced
    • test locally on the testdev_balanced split (This won't work unless you have a file testdev_sceneGraphs.json under exp_gqa/gqa_dataset/sceneGraphs/ that contains scene graphs for test-dev images, which we don't):
      python exp_gqa/main.py --cfg exp_gqa/cfgs/lcgn_scene_graph.yaml TEST.SPLIT_VQA testdev_balanced
    • generate the submission file on submission_all for EvalAI (This won't work unless you have a file submission_sceneGraphs.json under exp_gqa/gqa_dataset/sceneGraphs/ that contains scene graphs for all images, which we don't):
      python exp_gqa/main.py --cfg exp_gqa/cfgs/lcgn_scene_graph.yaml TEST.SPLIT_VQA submission_all TEST.DUMP_PRED True

Acknowledgements

Part of the CLEVR and GQA dataset preprocessing code, many TensorFlow operations (such as models_clevr/ops.py) and multi-GPU training code are obtained from the MAC codebase.

The outline of the configuration code (such as models_clevr/config.py) is obtained from the Detectron codebase.

The ResNet feature extraction script (exp_clevr/data/extract_features.py) is obtained from the CLEVR-IEP codebase.