

Cross-modal Map Learning for Vision and Language Navigation

G.Georgakis, K.Schmeckpeper, K.Wanchoo, S.Dan, E.Miltsakaki, D.Roth, K.Daniilidis

IEEE International Conference on Computer Vision and Pattern Recognition 2022


pip install -r requirements.txt

Habitat-lab and habitat-sim need to be installed before using our code. We build our method on the latest stable versions for both, so use git checkout tags/v0.1.7 before installation. Follow the instructions in their corresponding repositories to install them on your system. Note that our code expects that habitat-sim is installed with the flag --with-cuda.

Trained Models

We provide our trained models for reproducing the navigation results shown in the paper here. In addition we provide the semantic segmentation model here. The DD-PPO model (gibson-4plus-mp3d-train-val-test-resnet50.pth) we used for the controller can be found here.


We use the Vision and Language Navigation in Continuous Environments (VLN-CE) dataset. Episodes can be found here. VLN-CE is based on the Matterport3D (MP3D) dataset (the habitat subset and not the entire Matterport3D). Follow the instructions in the habitat-lab repository regarding downloading the data and the dataset folder structure. In addition we provide the following:


Here we provide instructions on how to use our code. All options can be found in train_options.py. The episodes from VLN-CE should be under the --root_path. The DD-PPO model should be placed under root_path/local_policy_models

Testing on VLN-CE

To run an evaluation of CM2-GT on a single scene from val-seen:

python main.py --name test_cm2-gt_val-seen --root_path /path/to/habitat-lab/folder/ --scenes_dir /habitat-lab/data/scene_datasets/ --model_exp_dir /path/to/cm2-gt/model/folder/ --log_dir logs/ --scenes_list 1pXnuDYAj8r --gpu_capacity 1 --split val_seen --use_first_waypoint --vln

To run an evaluation of CM2 on a single scene from val-seen:

python main.py --name test_cm2_val-seen --root_path /path/to/habitat-lab/folder/ --scenes_dir /habitat-lab/data/scene_datasets/ --model_exp_dir /path/to/cm2/model/folder/ --log_dir logs/ --img_segm_model_dir /path/to/img/segm/model/folder/ --scenes_list 1pXnuDYAj8r --gpu_capacity 1 --split val_seen --use_first_waypoint --goal_conf_thresh 0.2 --vln_no_map

To enable visualizations during testing use --save_nav_images.

Generating training data

To generate the data for a single scene from train split to train the CM2-GT model:

python store_episodes_vln.py --root_path /path/to/habitat-lab/folder/ --scenes_dir /habitat-lab/data/scene_datasets/ --episodes_save_dir /path/to/cm2-gt/episodes/save/dir/ --scenes_list 1pXnuDYAj8r --gpu_capacity 1

To generate the data for a single scene from train split to train the CM2 model:

python store_episodes_vln_no_map.py --root_path /path/to/habitat-lab/folder/ --scenes_dir /habitat-lab/data/scene_datasets/ --episodes_save_dir /path/to/cm2/episodes/save/dir/ --scenes_list 1pXnuDYAj8r --gpu_capacity 1 --img_segm_model_dir /path/to/img/segm/model/folder/


To train a new CM2-GT model:

python main.py --name train_cm2-gt --stored_episodes_dir /path/to/cm2-gt/episodes/save/dir/ --log_dir logs/ --is_train --summary_steps 500 --image_summary_steps 1000 --test_steps 20000 --checkpoint_steps 50000 --pad_text_feat --batch_size 40 --vln --finetune_bert_last_layer --use_first_waypoint --sample_1

To train a new CM2 model:

python main.py --name train_cm2 --stored_episodes_dir /path/to/cm2/episodes/save/dir/ --log_dir logs/ --is_train --summary_steps 500 --image_summary_steps 1000 --test_steps 20000 --checkpoint_steps 50000 --pad_text_feat --batch_size 10 --finetune_bert_last_layer --use_first_waypoint --vln_no_map --sample_1