Awesome

DHRL: A Graph-Based Approach for Long-Horizon and Sparse Hierarchical Reinforcement Learning

This is a PyTorch implementation for our paper: "DHRL: A Graph-Based Approach for Long-Horizon and Sparse Hierarchical Reinforcement Learning" (NeurIPS 2022 Oral designated paper).

By Seungjae Lee, Jigang Kim, Inkyu Jang, and H. Jin Kim

A link to our paper can be found on arXiv.

Overview

We present a method of Decoupling Horizons Using a Graph in Hierarchical Reinforcement Learning (DHRL) which can alleviate this problem by decoupling the horizons of high-level and low-level policies and bridging the gap between the length of both horizons using a graph. DHRL provides a freely stretchable high-level action interval, which facilitates longer temporal abstraction and faster training in complex tasks. Our method outperforms state-of-the-art HRL algorithms in typical HRL environments. Moreover, DHRL achieves long and complex locomotion and manipulation tasks.

Installation

create conda environment

conda create -n dhrl python=3.7
conda activate dhrl

install pytorch that fits your computer settings.(we used pytorch==1.7.1 and pytorch==1.11.0) Then, install additional modules using

./install.sh

if permission denied,

chmod +x install.sh
chmod +x ./scripts/*.sh

To run MuJoCo simulation, a license is required.

Usage

Training and Evaluation

./scripts/{ENV}.sh {GPU} {SEED}

./scripts/Reacher.sh 0 0
./scripts/AntMazeSmall.sh 0 0
./scripts/AntMaze.sh 0 0
./scripts/AntMazeBottleneck.sh 0 0
./scripts/AntMazeComplex.sh 0 0

Troubleshooting

protobuf error

pip install --upgrade protobuf==3.20.0

gym EntryPoint error

pip uninstall gym
pip install gym==0.22.0

Citation

If you find this work useful in your research, please cite:

@inproceedings{lee2022graph,
  title={DHRL: A Graph-Based Approach for Long-Horizon and Sparse Hierarchical Reinforcement Learning},
  author={Lee, Seungjae and Kim, Jigang and Jang, Inkyu and Kim, H Jin},
  booktitle={Thirty-sixth Conference on Neural Information Processing Systems},
  pages={},
  year={2022},
  organization={}
}

Our code sourced and modified from official implementation of L3P Algorithm.