Awesome
Dual Attention Networks for Visual Dialog
Pytorch Implementation for the paper:
Dual Attention Networks for Visual Reference Resolution in Visual Dialog <br> Gi-Cheon Kang, Jaeseo Lim, and Byoung-Tak Zhang <br> In EMNLP 2019
<!--![Overview of Dual Attention Networks](dan_overview.jpg)--> <img src="dan_overview.jpg" width="90%" align="middle">If you use this code in your published research, please consider citing:
@inproceedings{kang2019dual,
title={Dual Attention Networks for Visual Reference Resolution in Visual Dialog},
author={Kang, Gi-Cheon and Lim, Jaeseo and Zhang, Byoung-Tak},
booktitle={Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing},
pages = {2024--2033},
year={2019}
}
Setup and Dependencies
This starter code is implemented using PyTorch v0.3.1 with CUDA 8 and CuDNN 7. <br> It is recommended to set up this source code using Anaconda or Miniconda. <br>
- Install Anaconda or Miniconda distribution based on Python 3.6+ from their downloads' site.
- Clone this repository and create an environment:
git clone https://github.com/gicheonkang/DAN-VisDial
conda create -n dan_visdial python=3.6
# activate the environment and install all dependencies
conda activate dan_visdial
cd dan-visdial/
pip install -r requirements.txt
Download Features
- We used the Faster-RCNN pre-trained with Visual Genome as image features. Download the image features below, and put each feature under
$PROJECT_ROOT/data/{SPLIT_NAME}_feature
directory. We needimage_id
to RCNN bounding box index file ({SPLIT_NAME}_imgid2idx.pkl
) because the number of bounding box per image is not fixed (ranging from 10 to 100).
train_btmup_f.hdf5
: Bottom-up features of 10 to 100 proposals from images oftrain
split (32GB).train_imgid2idx.pkl
:image_id
to bbox index file fortrain
splitval_btmup_f.hdf5
: Bottom-up features of 10 to 100 proposals from images ofvalidation
split (0.5GB).val_imgid2idx.pkl
:image_id
to bbox index file forval
splittest_btmup_f.hdf5
: Bottom-up features of 10 to 100 proposals from images oftest
split (2GB).test_imgid2idx.pkl
:image_id
to bbox index file fortest
split
- Download the GloVe pretrained word vectors from here, and keep
glove.6B.300d.txt
under$PROJECT_ROOT/data/glove
directory.
Data preprocessing & Word embedding initialization
# data preprocessing
cd DAN-VisDial/data/
python prepro.py
# Word embedding vector initialization (GloVe)
cd ../utils
python utils.py
Training
Simple run
python train.py
Saving model checkpoints
By default, our model save model checkpoints at every epoch. You can change it by using -save_step
option.
Logging
Logging data checkpoints/start/time/log.txt
shows epoch, loss, and learning rate.
Evaluation
Evaluation of a trained model checkpoint can be evaluated as follows:
python evaluate.py -load_path /path/to/.pth -split val
Validation scores can be checked in offline setting. But if you want to check the test split
score, you have to submit a json file to online evaluation server. You can make json format with -save_ranks=True
option.
Pre-trained model & Results
We provide the pre-trained model reported as the best single model in the paper. <br> To reproduce the results reported in the paper, please run the command below and submit the json file to online evaluation server.
python evaluate.py -load_path /path/to/dan_disc_epoch_12.pth -split test -use_gt False -save_ranks True
Performance on v1.0 test-std
(trained on v1.0
train):
Model | NDCG | MRR | R@1 | R@5 | R@10 | Mean |
---|---|---|---|---|---|---|
DAN | 0.5759 | 0.6320 | 49.63 | 79.75 | 89.35 | 4.30 |
License
MIT License
Acknowledgements
This work was partly supported by the Korea government (2015-0-00310-SW.StarLab, 2017-0-01772-VTT, 2018-0-00622-RMI, 2019-0-01367-BabyMind, 10060086-RISF, P0006720-GENKO), and the ICT at Seoul National University.