Home

Awesome

ExORL: Exploratory Data for Offline Reinforcement Learning

This is an original PyTorch implementation of the ExORL framework from

Don't Change the Algorithm, Change the Data: Exploratory Data for Offline Reinforcement Learning by

Denis Yarats*, David Brandfonbrener*, Hao Liu, Misha Laskin, Pieter Abbeel, Alessandro Lazaric, and Lerrel Pinto.

*Equal contribution.

Prerequisites

Install MuJoCo if it is not already the case:

Install the following libraries:

sudo apt update
sudo apt install libosmesa6-dev libgl1-mesa-glx libglfw3 unzip

Install dependencies:

conda env create -f conda_env.yml
conda activate exorl

Datasets

We provide exploratory datasets for 6 DeepMind Control Stuite domains

DomainDataset nameAvailable task names
Cartpolecartpolecartpole_balance, cartpole_balance_sparse, cartpole_swingup, cartpole_swingup_sparse
Cheetahcheetahcheetah_run, cheetah_run_backward
Jaco Armjacojaco_reach_top_left, jaco_reach_top_right, jaco_reach_bottom_left, jaco_reach_bottom_right
Point Mass Mazepoint_mass_mazepoint_mass_maze_reach_top_left, point_mass_maze_reach_top_right, point_mass_maze_reach_bottom_left, point_mass_maze_reach_bottom_right
Quadrupedquadrupedquadruped_walk, quadruped_run
Walkerwalkerwalker_stand, walker_walk, walker_run

For each domain we collected datasets by running 9 unsupervised RL algorithms from URLB for total of 10M steps. Here is the list of algorithms

Unsupervised RL methodNamePaper
APSapspaper
APT(ICM)icm_aptpaper
DIAYNdiaynpaper
Disagreementdisagreementpaper
ICMicmpaper
ProtoRLprotopaper
RandomrandomN/A
RNDrndpaper
SMMsmmpaper

You can download a dataset by running ./download.sh <DOMAIN> <ALGO>, for example to download ProtoRL dataset for Walker, run

./download.sh walker proto

The script will download the dataset from S3 and store it under datasets/walker/proto/, where you can find episodes (under buffer) and episode videos (under video).

Offline RL training

We also provide implementation of 5 offline RL algorithms for evaluating the datasets

Offline RL methodNamePaper
Behavior Cloningbcpaper
CQLcqlpaper
CRRcrrpaper
TD3+BCtd3_bcpaper
TD3td3paper

After downloading required datasets, you can evaluate it using offline RL methon for a specific task. For example, to evaluate a dataset collected by ProtoRL on Walker for the waling task using TD3+BC you can run

python train_offline.py agent=td3_bc expl_agent=proto task=walker_walk

Logs are stored in the output folder. To launch tensorboard run:

tensorboard --logdir output

Citation

If you use this repo in your research, please consider citing the paper as follows:

@article{yarats2022exorl,
  title={Don't Change the Algorithm, Change the Data: Exploratory Data for Offline Reinforcement Learning},
  author={Denis Yarats, David Brandfonbrener, Hao Liu, Michael Laskin, Pieter Abbeel, Alessandro Lazaric, Lerrel Pinto},
  journal={arXiv preprint arXiv:2201.13425},
  year={2022}
}

License

The majority of ExORL is licensed under the MIT license, however portions of the project are available under separate license terms: DeepMind is licensed under the Apache 2.0 license.