Awesome
The Road to Know-Where: An Object-and-Room Informed Sequential BERT for Indoor Vision-Language Navigation
This is the repository of ORIST (ICCV 2021).
<p align="center"> <img src="orist.png" width="100%"> </p>Some code in this repo are copied/modified from opensource implementations made available by PyTorch, HuggingFace, OpenNMT, Nvidia, and UNITER The object features are extracted using BUTD, with expanded object bounding boxes of REVERIE.
Features of the Code
- Implemented distributed data parallel training (pytorch).
- Some code optimization for fast training
Requirements
- Install Docker with GPU support (There are lots of tutorials, just google it.)
- Pull the docker image:
docker pull qykshr/ubuntu:orist
Quick Start
-
Download the processed data and pretrained models:
- Processed data:
- For evaluation only:
- For training:
-
Build Matterport3D simulator
Build OSMesa version using CMake:
mkdir build && cd build cmake -DOSMESA_RENDERING=ON .. make cd ../
Other versions can refer to here
-
Run inference:
sh eval_scripts/xxx.sh
-
Run training:
sh run_scripts/xxx.sh
Citation
If this code or data is useful for your research, please consider citing:
@inproceedings{orist,
author = {Yuankai Qi and
Zizheng Pan and
Yicong Hong and
Ming{-}Hsuan Yang and
Anton van den Hengel and
Qi Wu},
title = {The Road to Know-Where: An Object-and-Room Informed Sequential BERT for Indoor Vision-Language Navigation},
booktitle = {ICCV},
pages = {1655--1664},
year = {2021}
}
@inproceedings{reverie,
author = {Yuankai Qi and
Qi Wu and
Peter Anderson and
Xin Wang and
William Yang Wang and
Chunhua Shen and
Anton van den Hengel},
title = {{REVERIE:} Remote Embodied Visual Referring Expression in Real Indoor
Environments},
booktitle = {CVPR},
pages = {9979--9988},
year = {2020}
}