Awesome

Learning Exploration Policies for Navigation

In ICLR 2019 [Project Website] [Demo Video] [pdf]

If you find this code useful, please consider citing our work:

@inproceedings{chen2018learning,
  author = "Chen, Tao and Gupta, Saurabh and Gupta, Abhinav",
  title = "Learning Exploration Policies for Navigation",
  booktitle = "International Conference on Learning Representations",
  year = "2019",
  url = "https://openreview.net/pdf?id=SyMWn05F7"
}

The code has been tested on Ubuntu 16.04.

Installation

Folder Structure

├── navigation
│   ├── suncg_data
│   ├── SUNCGtoolbox
│   └── exp4nav

Install dependencies

sudo apt-get install libglfw3-dev libglm-dev libx11-dev libegl1-mesa-dev libpng-dev
sudo apt-get install libpng16-dev libjpeg9 libjpeg-dev build-essential pkg-config
sudo apt-get install git curl wget automake libtool

Install Anaconda

Create a new virtual python environment

cd ~
mkdir navigation
cd ~/navigation
git clone --recurse-submodules https://github.com/taochenshh/exp4nav.git
cd exp4nav
conda env create -f environment.yml
conda activate exp4nav

Download SUNCG dataset, unzip it in navigation

Clone SUNCGtoolbox:

cd ~/navigation
git clone  https://github.com/shurans/SUNCGtoolbox.git
cd SUNCGtoolbox/gaps
make clean 
make

Compile the render for House3D

cd ~/navigation/exp4nav/House3D/renderer
SYSTEM=conda.linux PYTHON_CONFIG=/path/to/anaconda3/envs/exp4nav/bin/python3-config make -j

Add House3D to PYTHONPATH

echo "export PYTHONPATH=$PYTHONPATH:~/navigation/exp4nav/House3D" >> ~/.bashrc
source ~/.bashrc
conda activate exp4nav

Download trained models with IL and RL, pre-trained models with IL, and house id files

cd ~/navigation/exp4nav
wget https://www.dropbox.com/s/q2d883k6eb2rvmg/path_data.tar.gz
wget https://www.dropbox.com/s/z2cij8kjttilf7k/pretrain.tar.gz
tar -xvzf path_data.tar.gz
tar -xvzf pretrain.tar.gz

map_only and map_rgb are models trained with IL and RL. il_map_only and il_map_rgb are models trained with IL.

Preprocess SUNCG houses

cd ~/navigation/exp4nav/src/utils
python env_remove_components.py

Generate obj+mtl files for houses in EQA

cd ~/navigation/exp4nav/gutils
python make_houses.py \
    -eqa_path ../path_data/eqa_v1.json \
    -suncg_toolbox_path ../../SUNCGtoolbox \
    -suncg_data_path ../../suncg_data \
    -hf_name house-no-doors
cd ~/navigation/exp4nav/src/utils
python preprocess_house.py

Training:

cd ~/navigation/exp4nav/src

## IL+RL with Map+RGB
python main.py --lr=0.00001 --rnn_hidden_dim=128 --area_reward_scale=0.0005 --gamma=0.999 \
       --collision_penalty=0.006 --ent_coef=0.01  --train_rollout_repeat=2 --max_depth=3 \
       --num_steps=500 --il_pretrain --pretrain_dir=../pretrain/il_map_rgb/model  \
       --num_envs=8 --use_rgb_with_map   --save_dir=data/map_rgb --seed=1

## IL+RL with Map
python main.py --lr=0.00001 --rnn_hidden_dim=128 --area_reward_scale=0.0005 --gamma=0.999 \
       --collision_penalty=0.006 --ent_coef=0.01  --train_rollout_repeat=2 --max_depth=3 \
       --num_steps=500 --il_pretrain --pretrain_dir=../pretrain/il_map_only/model  --num_envs=8  \
       --save_dir=data/map_only --seed=1

## RL with Map+RGB
python main.py --lr=0.00001 --rnn_hidden_dim=128 --area_reward_scale=0.0005 --gamma=0.999 \
       --collision_penalty=0.006 --ent_coef=0.01  --train_rollout_repeat=2 --max_depth=3 \
       --num_steps=500 --num_envs=8   --use_rgb_with_map --save_dir=data/no_pretrain_map_rgb \
       --seed=1

## RL with Map
python main.py --lr=0.00001 --rnn_hidden_dim=128 --area_reward_scale=0.0005 --gamma=0.999 \
       --collision_penalty=0.006 --ent_coef=0.01  --train_rollout_repeat=2 --max_depth=3 \
       --num_steps=500 --num_envs=8   --save_dir=data/no_pretrain_map_only --seed=1

Add --test and change the number of parallel envs to 1 (--num_envs=1) to test the policies. Add --render in the end if you want to visually test the policy.

Note that the policy can be trained with RL only (without imitation learning). However, If you have some demonstration data, it will greatly increase the learning speed.

Testing on the pre-defined testing houses

cd ~/navigation/exp4nav/src

## IL+RL with Map+RGB
python test_policy.py --lr=0.00001 --rnn_hidden_dim=128 --area_reward_scale=0.0005 --gamma=0.999 \
       --collision_penalty=0.006 --ent_coef=0.01  --train_rollout_repeat=2 --max_depth=3 \
       --num_steps=1000 --il_pretrain --pretrain_dir=../pretrain/il_map_rgb/model  \
       --num_envs=8 --use_rgb_with_map   --save_dir=data/map_rgb --seed=1

## IL+RL with Map
python test_policy.py --lr=0.00001 --rnn_hidden_dim=128 --area_reward_scale=0.0005 --gamma=0.999 \
       --collision_penalty=0.006 --ent_coef=0.01  --train_rollout_repeat=2 --max_depth=3 \
       --num_steps=1000 --il_pretrain --pretrain_dir=../pretrain/il_map_only/model  --num_envs=8  \
       --save_dir=data/map_only --seed=1

## RL with Map+RGB
python test_policy.py --lr=0.00001 --rnn_hidden_dim=128 --area_reward_scale=0.0005 --gamma=0.999 \
       --collision_penalty=0.006 --ent_coef=0.01  --train_rollout_repeat=2 --max_depth=3 \
       --num_steps=1000 --num_envs=8   --use_rgb_with_map --save_dir=data/no_pretrain_map_rgb \
       --seed=1

## RL with Map
python test_policy.py --lr=0.00001 --rnn_hidden_dim=128 --area_reward_scale=0.0005 --gamma=0.999 \
       --collision_penalty=0.006 --ent_coef=0.01  --train_rollout_repeat=2 --max_depth=3 \
       --num_steps=1000 --num_envs=8   --save_dir=data/no_pretrain_map_only --seed=1

Plot Performance

cd ~/navigation/exp4nav/src
python performance_plot.py --data_dir=./data