Awesome
LCMCG.Pytorch
This repo is the official implementation of "Learning Cross-Modal Context Graph for Visual Grounding" (AAAI2020)
Installation
Check INSTALL.md for installation instructions.
pre-requirements
- Download the flickr30k dataset in this link
- Pre-computed bounding boxes are extracted by using FasterRCNN
We use the config "e2e_faster_rcnn_R_50_C4_1x.yaml" to train the object detector on MSCOCO dataset and extract the feature map at C4 layer. - Language graph extraction by using SceneGraphParser. I have uploaded the sg_anno.json into Google drive. You can download it now.
- Some pre-processing data, like sentence annotations, box annotations.
- You need to create the './flickr_datasets' folder and put all annotation in it. I would highly recommend you to figure all the data path out in this project. You can refer this two file "maskrcnn_benchmark/config/paths_catalog.py" and "maskrcnn_benchmark/data/flickr.py" for details.
The pretrained object detector weights and annotations can be found here at baidu-disk (link:https://pan.baidu.com/s/1bYbGUsHcZJQHele87MzcMg password:5ie6) or google drive
training
- You can train our model by running the scripts
sh scripts/train.sh
""
citation
If you are interested in our paper, please cite it.
@inproceedings{liu2019learning,
title={Learning Cross-modal Context Graph for Visual Grounding},
author={Liu, Yongfei and Wan, Bo and Zhu, Xiaodan and He, Xuming},
booktitle={Proceedings of the AAAI Conference on Artificial Intelligenc}
year={2020}
}