Home

Awesome

EDE: Exploration via Distributional Ensemble

This is a PyTorch implementation of the methods proposed in

Uncertainty-Driven Exploration for Generalization in Reinforcement Learning by

Yiding Jiang, Zico Kolter, and Roberta Raileanu.

In this work, we find that exploration is crucial for generalization to new task instances and propose the first value-based method that achieves state-of-the-art performance on both Procgen and Crafter. Our algorithm encourages the exploration of states with high epistemic uncertainty, which is estimated using deep ensembles and distributional RL.

<img src="figures/new_ede_overview.png" width="900">

Requirements

conda create -n level-replay python=3.8
conda activate level-replay

git clone https://github.com/facebookresearch/level-replay.git
cd level-replay
pip install -r requirements.txt

# Clone a level-replay-compatible version of OpenAI Baselines.
git clone https://github.com/minqi/baselines.git
cd baselines 
python setup.py install
cd ..

# Clone level-replay-compatible versions of the Procgen environment.
git clone https://github.com/minqi/procgen.git
cd procgen 
python setup.py install
cd ..

git clone https://github.com/facebookresearch/ede.git
cd ede
pip install -r requirements.txt

Train EDE on Procgen

python train_rainbow.py \\
    --algo=rainbow \\
    --env_name=bigfish \\
    --qrdqn=True \\
    --qrdqn_bootstrap=True \\
    --n_ensemble=5 \\
    --ucb_c=30 \\
    --diff_epsilon_schedule=True \\
    --diff_eps_schedule_base=0.6 \\
    --diff_eps_schedule_exp=7

These flags are respective:

Train base QR-DQN / DQN

python train_rainbow.py \\
    --algo=rainbow \\
    --env_name=bigfish \\
    --qrdqn=True \\
    --qrdqn_bootstrap=True

Train ez-greedy exploration

python train_rainbow.py \\
    --algo=rainbow \\
    --env_name=bigfish \\
    --qrdqn=True \\
    --eps_z=True \\
    --eps_z_n=10000 \\
    --eps_z_mu=2

Train NosiyNet

python train_rainbow.py \\
    --algo=rainbow \\
    --env_name=bigfish \\
    --qrdqn=True \\
    --noisy_layers=True

Train EDE on Crafter

cd crafter

python main_qrdqn.py \\
    --algorithm qrdqn \\
    --explore_strat thompson \\
    --bootstrapped_qrdqn True \\
    --ucb_c 0.5 \\
    --qrdqn_always_train_feat False \\
    --T-max 1500000 \\
    --batch-size 64

Tabular Experiments

<img src="figures/new_ede_tabular.png" width="900">

Procgen Results

EDE achieves state-of-the-art performance on both the Procgen and the Crafter benchmark, improving both train and test performances. This is the first value-based method to achieve SOTA on both of these benchmarks for generalization in RL.

<img src="figures/new_ede_procgen.png" width="900">

Ablations

<img src="figures/new_ede_ablations.png" width="900">

Crafter Results

<img src="figures/new_ede_crafter.png" width="300">

Acknowledgements

This code was based on open sourced implementations of PLR, QR-DQN, Procgen, and Crafter.

Figures in this work are made with assets from FlatIcon:

Citation

If you use this code in your own work, please cite our paper:

@inproceedings{
anonymous2023uncertaintydriven,
title={Uncertainty-Driven Exploration for Generalization in Reinforcement Learning},
author={Anonymous},
booktitle={Submitted to The Eleventh International Conference on Learning Representations },
year={2023},
url={https://openreview.net/forum?id=nulUqBMpBb},
note={under review}
}

License

The majority of this repository is licensed under CC-BY-NC 4.0 (Creative Commons Attribution-NonCommercial 4.0 International License). However, portions of this code are available under separate license terms e.g. Procgen and Crafter are licensed under the MIT license.