Home

Awesome

Improving Sample Efficiency of Deep Reinforcement Learning for Bipedal Walking

Abstract

Reinforcement learning holds a great promise of enabling bipedal walking in humanoid robots. However, despite encouraging recent results, training still requires significant amounts of time and resources, precluding fast iteration cycles of the control development. Therefore, faster training methods are needed. In this paper, we investigate a number of techniques for improving sample efficiency of on-policy actor-critic algorithms and show that a significant reduction in training time is achievable with a few straightforward modifications of the common algorithms, such as PPO and DeepMimic, tailored specifically towards the problem of bipedal walking. Action space representation, symmetry prior induction, and cliprange scheduling proved effective at reducing sample complexity by a factor of 4.5. These results indicate that domain-specific knowledge can be readily utilized to reduce training times and thereby enable faster development cycles in challenging robotic applications.

Installation

  1. Install CUDA 10.1 following this medium post.
  2. Follow these instructions to install anaconda.
  3. Create a conda environment from the .yml file located in the repository with conda env create -f path/to/conda_env.yml
  4. Install MuJoCo and mujoco-py following these instructions.

Main scripts, files and folders

Supplementary videos

Questions?

Please contact Rustam Galljamov in case you have any questions regarding the code.