Awesome
Visual-Language Navigation Pretraining via Prompt-based Environmental Self-exploration
This is the code for the ProbES paper.
Catalog:
- Generating pretraining dataset
- Pretraining on generated dataset
- Finetuning on downstream tasks
Install Dependencies
- Python requirements: Need python3.6 or higher and install dependencies:
pip install -r requirements.txt
- Install Matterport3D simulator. Notice that this code uses the old version (v0.1) of the simulator.
Preparing Dataset
-
Download all of the required data files:
python scripts/download-auxiliary-data.py wget https://dl.dropbox.com/s/67k2vjgyjqel6og/matterport-ResNet-101-faster-rcnn-genome.lmdb.zip -P data/ unzip data/matterport-ResNet-101-faster-rcnn-genome.lmdb.zip -d data/
-
Download pre-computed CLIP features:
wget https://nlp.cs.unc.edu/data/vln_clip/features/CLIP-ViT-B-32-views.tsv -P data/img_features/
-
Generating pretraining dataset:
sh scripts/generate_pretrain_data.sh
Training
coming soon
Acknowledgement
The implementation relies on resources from VLN-BERT, Airbert and CLIP-ViL. We thank the original authors for their open-sourcing.
Reference
If you find this code useful, please consider citing.
@article{liang2022visual,
title={Visual-Language Navigation Pretraining via Prompt-based Environmental Self-exploration},
author={Liang, Xiwen and Zhu, Fengda and Li, Lingling and Xu, Hang and Liang, Xiaodan},
journal={arXiv preprint arXiv:2203.04006},
year={2022}
}