Awesome
PyTorch implementation of Cross-modal Memory Network
Vision-Dialog Navigation by Exploring Cross-modal Memory, CVPR 2020.
<!-- ![Model Illustration](teaser/model-1.png) -->Requirements
- Ubuntu 16.04
- CUDA 9.0 or 10.0
- docker
- nvidia-docker2.0
We recommend using the mattersim Dockerfile to install the simulator.
Dataset Download
Download the train
, val_seen
, val_unseen
, and test
splits of the CVDN and NDH dataset by executing:
sh tasks/CVDN/data/download.sh
sh tasks/NDH/data/download.sh
Installation
Build the docker image:
docker build -t mattersim .
Run the docker container, mounting your project path:
nvidia-docker run -it --shm-size 64G -v /User/home/Path_To_Project/:/Workspace/ mattersim
Compile the codebase:
mkdir build && cd build
cmake -DEGL_RENDERING=ON ..
make
Install python dependencies by running:
pip install -r tasks/NDH/requirements.txt
Train and Evaluate
To train and evaluate with trusted supervision, sample feedback, and all dialog history:
python tasks/NDH/train.py \
--path_type=trusted_path \
--history=all \
--feedback=sample \
--eval_type=val \
--prefix=v1
Train and test with trusted supervision, sample feedback, and all dialog history:
python tasks/NDH/train.py \
--path_type=trusted_path \
--history=all \
--feedback=sample \
--eval_type=test \
--prefix=v1
To generate a summary of the agent performance:
python tasks/NDH/summarize_perf.py
Citation
If you use the code in your research, please cite:
@inproceedings{zhu2020vision,
title={Vision-Dialog Navigation by Exploring Cross-modal Memory},
author={Zhu, Yi and Zhu, Fengda and Zhan, Zhaohuan and Lin, Bingqian and Jiao, Jianbin and Chang, Xiaojun and Liang, Xiaodan},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
pages={10730--10739},
year={2020}
}
Acknowledgements
This repository is built upon the Matterport3DSimulator, CVDN and DAN-VisDial codebase.