Home

Awesome

Introduction

This repository implements DBNet, proposed in the following paper:

Yuting Zhang, Luyao Yuan, Yijie Guo, Zhiyuan He, I-An Huang, Honglak Lee, “Discriminative Bimodal Networks for Natural Language Visual Localization and Detection”, IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017. Spotlight

Remark:

Compilation

This code is based on Caffe and MATLAB (including C++ code).

Compile Caffe

The custom version of Caffe is at ./caffe. Please compile it by yourself (refer to the official guide http://caffe.berkeleyvision.org/installation.html ). C++11 compatible compiler is needed. You will need to successfully run the following commands:

make
make matcaffe

Compile MEX (C++ in MATLAB)

Start MATLAB in the root folder of the code. Run the following command:

compile_mex

Setting up data

Please download and set up all data as follows.

Visual Genome annotations

The following link contains Visual Genome annotations in MATLAB format. Please download and extract it somewhere (you can specify the path later in the config file).

The annotations have been cleaned up as described in the paper. Extra annotations are provided for text similarity.

Please download the Visual Genome images from the official site.

After extracting the annotations and images, please do the following:

  1. Start MATLAB at the root folder of the code
  2. Run global_settings;
  3. Exit MATLAB
  4. Update the Visual Genome paths in ./system/global_settings.m

Networks and pre-trained models

The following link contains network definitions, pre-trained models, and test results for VGGNet-16 based DBNet. Please download and extract it at the root folder of the code.

Precomputed region proposal

The following link contains precomputed region proposal for all Visual Genome images using EdgeBox. Please download and extract it somewhere (you can specify the path later in the config file).

Run experiments

You will find the experiment have 4 phases, where param_phase? (?=1,2,3,4) defines the parameters and input/output directories, and run_phase?('GPU_ID',0) (?=1,2,3,4) runs the experiment for each phase, e.g., on GPU 0.

The 4 phases are as follows:

Run a phase from scratch

Note that the downloaded data include cache for all phases. When you start the experiment, it will resume from the current cache. In particular,

To run a phase from scratch, please remove the files in the corresponding output folders (as previously specified).

Snapshot and resume training

The training script automatically does snapshotting in every 20,000 iterations. This interval is defined at the beginning of ./pipeline/pipTrain.m

The training will not terminate automatically (the default maximum number of iterations is set to Inf). You can use Ctrl+C to pause the training in MATLAB. Then,

Interrupt and resume testing

Testing can also be interrupted (by Ctrl+C) and resumed (by running the test phase again) at any time.

Note that an index file ./exp/vgg16/cache/Test/_index.lock is used to track the testing progress. Any image that has been tested will not be tested again. You can remove the index file to let the script try to test all images again.

Override default parameters

To run a phase using run_phase?(___), you can override any default parameters (without changing the source code) by providing the parameters in the command line, such as

run_phase1('GPU_ID',0,'Train_SnapshotFrequency',5000,'Train_MaxIteration',3e5,'BaseModel_BatchSize',1,'ConvFeatNet_BatchSize',1)

TODOs (will be available soon)