

ETPNav: Evolving Topological Planning for Vision-Language Navigation in Continuous Environments

Dong An; Hanqing Wang; Wenguan Wang; Zun Wang; Yan Huang; Keji He; Liang Wang;

Accepted to TPAMI 2024

Paper

πŸ”₯Winner of the RxR-Habitat Challenge in CVPR 2022. [Challenge Report] [Challenge Certificate]

This work tackles a practical yet challenging VLN setting - vision-language navigation in continuous environments (VLN-CE). To develop a robust VLN-CE agent, we propose a new navigation framework, ETPNav, which focuses on two critical skills: 1) the capability to abstract environments and generate long-range navigation plans, and 2) the ability of obstacle-avoiding control in continuous environments. ETPNav performs online topological mapping of environments by self-organizing predicted waypoints along a traversed path, without prior environmental experience. It privileges the agent to break down the navigation procedure into high-level planning and low-level control. Concurrently, ETPNav utilizes a transformer-based cross-modal planner to generate navigation plans based on topological maps and instructions. The plan is then performed through an obstacle-avoiding controller that leverages a trial-and-error heuristic to prevent navigation from getting stuck in obstacles. Experimental results demonstrate the effectiveness of the proposed method. ETPNav yields more than 10% and 20% improvements over prior state-of-the-art on R2R-CE and RxR-CE datasets, respectively.

Follow the Habitat Installation Guide to install habitat-lab and habitat-sim. We use version v0.1.7 in our experiments, same as in the VLN-CE, please refer to the VLN-CE page for more details. In brief:

  1. Create a virtual environment. We develop this project with Python 3.6.

    conda env create -f environment.yaml
  2. Install habitat-sim for a machine with multiple GPUs or without an attached display (i.e. a cluster):

    conda install -c aihabitat -c conda-forge habitat-sim=0.1.7 headless
  3. Clone this repository and install all requirements for habitat-lab, VLN-CE and our experiments. Note that we specify gym==0.21.0 because its latest version is not compatible with habitat-lab-v0.1.7.

    git clone git@github.com:MarSaKi/ETPNav.git
    cd ETPNav
    python -m pip install -r requirements.txt
    pip install torch==1.9.1+cu111 torchvision==0.10.1+cu111 -f https://download.pytorch.org/whl/torch_stable.html
  4. Clone a stable habitat-lab version from the github repository and install. The command below will install the core of Habitat Lab as well as the habitat_baselines.

    git clone --branch v0.1.7 git@github.com:facebookresearch/habitat-lab.git
    cd habitat-lab
    python setup.py develop --all # install habitat and habitat_baselines

Scenes: Matterport3D

Instructions copied from VLN-CE:

Matterport3D (MP3D) scene reconstructions are used. The official Matterport3D download script (download_mp.py) can be accessed by following the instructions on their project webpage. The scene data can then be downloaded:

# requires running with python 2.7
python download_mp.py --task habitat -o data/scene_datasets/mp3d/

Extract such that it has the form scene_datasets/mp3d/{scene}/{scene}.glb. There should be 90 scenes. Place the scene_datasets folder in data/.

Data and Trained Weights



Download the pretraining datasets [link] (the same one used in DUET) and precomputed features [link], unzip in folder pretrain_src

CUDA_VISIBLE_DEVICES=0,1 bash pretrain_src/run_pt/run_r2r.bash 2333

Finetuning and Evaluation

Use main.bash for Training/Evaluation/Inference with a single GPU or with multiple GPUs on a single node. Simply adjust the arguments of the bash scripts:

# for R2R-CE
CUDA_VISIBLE_DEVICES=0,1 bash run_r2r/main.bash train 2333  # training
CUDA_VISIBLE_DEVICES=0,1 bash run_r2r/main.bash eval  2333  # evaluation
CUDA_VISIBLE_DEVICES=0,1 bash run_r2r/main.bash inter 2333  # inference
# for RxR-CE
CUDA_VISIBLE_DEVICES=0,1,2,3 bash run_rxr/main.bash train 2333  # training
CUDA_VISIBLE_DEVICES=0,1,2,3 bash run_rxr/main.bash eval  2333  # evaluation
CUDA_VISIBLE_DEVICES=0,1,2,3 bash run_rxr/main.bash inter 2333  # inference

Contact Information


Our implementations are partially inspired by CWP, Sim2Sim and DUET.

Thanks for their great works!


If you find this repository is useful, please consider citing our paper:

  title={ETPNav: Evolving Topological Planning for Vision-Language Navigation in Continuous Environments},
  author={An, Dong and Wang, Hanqing and Wang, Wenguan and Wang, Zun and Huang, Yan and He, Keji and Wang, Liang},
  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence},