Home

Awesome

Learning to Assemble Neural Module Tree Networks for Visual Grounding

This repository contains the code for the following paper:

Installation

  1. Install Python 3 (Anaconda recommended)
  2. Install Pytorch v1.0 or higher:
pip3 install torch torchvision
  1. Clone with Git, and then enter the root directory:
git clone --recursive https://github.com/daqingliu/NMTree.git && cd NMTree
  1. Prepare data
    • Follow data/README.md to prepare images and refcoco/refcoco+/refcocog annotations. Or simply run:
    # it will cost some time accordding to your network
    bash data/prepare_data.sh
    
    • Our visual features are extracted by MAttNet, please follow the instruction. Or just download and uncompress Refcocog visual features into data/feats/refcocog_umd for testing this repo.
    • Preprocess vocabulary:
    python misc/parser.py --dataset refcocog --split_by umd
    

Training

python tools/train.py \
    --id det_nmtree_01 \
    --dataset refcocog \
    --split_by umd \
    --grounding_model NMTree \
    --data_file data_dep \
    --batch_size 128 \
    --glove glove.840B.300d_dep \
    --visual_feat_file matt_res_gt_feats.pth

Evaluation

python tools/eval_gt.py \
    --log_path log/refcocog_umd_nmtree_01 \
    --dataset refcocog \
    --split_by umd \

python tools/eval_det.py \
    --log_path log/refcocog_umd_nmtree_01 \
    --dataset refcocog \
    --split_by umd

Citation

@inproceedings{liu2019learning,
title={Learning to Assemble Neural Module Tree Networks for Visual Grounding},
author={Liu, Daqing and Zhang, Hanwang and Zha, Zheng-Jun and Feng, Wu},
booktitle={The IEEE International Conference on Computer Vision (ICCV)},
year={2019}
}

Acknowledgments

Some codes come from Refer, MattNet, and gumbel-softmax.

This project is maintained by Liu Daqing. Welcome issues and PRs.