Awesome
DDPN
This project is the implementation of the paper Rethinking Diversified and Discriminative Proposal Generation for Visual Grounding.The network architecture with DDPN for our visual grounding model is illustrated in Figure 1.
<img src="https://github.com/XiangChenchao/DDPN/raw/master/images/DDPN.jpg" alt="Figure 1: The model architecture for our visual grounding model." width="60%"/> <center>Figure 1: The model network architecture for our visual grounding model.</center>Requirements
- Python version 2.7
- easydict
- cv2
- Pytorch 0.3 (optional, used for speed-up multi-threads data loading, recommend)
Pretrained Models
We release the trained models on four datasets, which achieve slightly better results than that shown in the paper.
Datasets | Flickr30k-Entities | Referit | Refcoco | Refcoco+ |
---|---|---|---|---|
val | 72.78% | 63.77% | 76.61% | 64.34% |
test | 73.45% | 63.27% | 76.23% | 64.01% |
testA | 79.99% | 71.24% | ||
testB | 72.11% | 55.55% |
- Download pretrained models BaiduYun
- Unzip the model files in directory './pretrained_model'.
Preprocess
-
Caffe
cd ./caffe make all -j32 make pycaffe
-
Download Images, Images only
- flickr30k-entities
- download the Flickr30k-Entities images
- move flickr30k-entities images to directory './data/flickr30k/flickr30k-images/'.
- referit, download the Referit Images.
wget -O ./data/referit/ImageCLEF/referitdata.tar.gz http://www.eecs.berkeley.edu/~ronghang/projects/cvpr16_text_obj_retrieval/referitdata.tar.gz tar -xzvf ./data/referit/ImageCLEF/referitdata.tar.gz -C ./data/referit/ImageCLEF/
- refcoco/refcoco+, download the mscoco train2014 Images
- mscoco train2014.
- move images of mscoco train2014 to directory './data/mscoco/image2014/train2014/'
- flickr30k-entities
-
Extract DDPN image features. For a 3xhxw image, we extract the 2048-D visual feature and 4-D spatial feature (post-processed to 5-D) as the input feature for our model. The script we use is as follows. Note that we use --num_bbox 100,100 to extract a fix number of proposals (K=100) for each image.
./tools/extract_feat.py --gpu 0,1,2,3 --cfg experiments/cfgs/faster_rcnn_end2end_resnet_vg.yml --def models/vg/ResNet-101/faster_rcnn_end2end/test.prototxt --net /path/to/caffemodel --img_dir /path/to/images/ --out_dir /path/to/outfeat/ --num_bbox 100,100 --feat_name pool5_flat
- For flickr30k or referit we output the images features in directory 'data/[flickr30k, referit]/features/bottom-up-feats/' by default. And for refcoco/refcoco+ we output the images features in 'data/mscoco/features/bottom-up-feats/train2014'.
-
Download Annotation files, we preprocess the annotations of flickr30k-entities, referit, refcoco, refcoco+ which makes all kind of data to be in same format, download our processed annotations here, BaiduYun, then unzip these zip files in directory './data'. We will release the code for preprocessing annotation in directory './preprocess'.
-
Modify the paths in the config file to adapt to your own environment, set data loader threads and images features dir and images dir in yaml config files in directory './config/experiments/'.
Training
- flickr30k-entities
python train_net.py --gpu_id 0 --train_split train --val_split val --cfg config/experiments/flickr30k-kld-bbox_reg.yaml
- referit
python train_net.py --gpu_id 0 --train_split train --val_split val --cfg config/experiments/referit-kld-bbox_reg.yaml
- refcoco
python train_net.py --gpu_id 0 --train_split train --val_split val --cfg config/experiments/refcoco-kld-bbox_reg.yaml
- refcoco+
python train_net.py --gpu_id 0 --train_split train --val_split val --cfg config/experiments/refcoco+-kld-bbox_reg.yaml
- Output model will be put in directory './models'
- Validation log output will be writen in directory './log'
Testing
- flickr30k-entities
python test_net.py --gpu_id 0 --test_split test --batchsize 64 --test_net pretrained_model/flickr30k/test.prototxt --pretrained_model pretrained_model/flickr30k/final.caffemodel --cfg config/experiments/flickr30k-kld-bbox_reg.yaml
- referit
python test_net.py --gpu_id 0 --test_split test --batchsize 64 --test_net pretrained_model/referit/test.prototxt --pretrained_model pretrained_model/referit/final.caffemodel --cfg config/experiments/referit-kld-bbox_reg.yaml
- refcoco
python test_net.py --gpu_id 0 --test_split test --batchsize 64 --test_net pretrained_model/refcoco/test.prototxt --pretrained_model pretrained_model/refcoco/final.caffemodel --cfg config/experiments/refcoco-kld-bbox_reg.yaml
- refcoco+
python test_net.py --gpu_id 0 --test_split test --batchsize 64 --test_net pretrained_model/refcoco+/test.prototxt --pretrained_model pretrained_model/refcoco+/final.caffemodel --cfg config/experiments/refcoco+-kld-bbox_reg.yaml
Citation
If the codes are helpful for your research, please cite
@article{yu2018rethining,
title={Rethinking Diversified and Discriminative Proposal Generation for Visual Grounding},
author={Yu, Zhou and Yu, Jun and Xiang, Chenchao and Zhao, Zhou and Tian, Qi and Tao, Dacheng},
journal={International Joint Conference on Artificial Intelligence (IJCAI)},
year={2018}
}