Awesome

Improving Sample Efficiency of Deep Reinforcement Learning for Bipedal Walking

Abstract

Reinforcement learning holds a great promise of enabling bipedal walking in humanoid robots. However, despite encouraging recent results, training still requires significant amounts of time and resources, precluding fast iteration cycles of the control development. Therefore, faster training methods are needed. In this paper, we investigate a number of techniques for improving sample efficiency of on-policy actor-critic algorithms and show that a significant reduction in training time is achievable with a few straightforward modifications of the common algorithms, such as PPO and DeepMimic, tailored specifically towards the problem of bipedal walking. Action space representation, symmetry prior induction, and cliprange scheduling proved effective at reducing sample complexity by a factor of 4.5. These results indicate that domain-specific knowledge can be readily utilized to reduce training times and thereby enable faster development cycles in challenging robotic applications.

Installation

Install CUDA 10.1 following this medium post.
Follow these instructions to install anaconda.
Create a conda environment from the .yml file located in the repository with conda env create -f path/to/conda_env.yml
Install MuJoCo and mujoco-py following these instructions.

Main scripts, files and folders

scripts/config_light.py specifies the simulation environment, as well as main hyperparameters and main experimental/training settings
scripts/common/config.py allows detailed control over all hyperparameters and experimental/training settings
scripts/train.py trains a policy on the specified environment
scripts/run.py loads a policy from a specified path and executes it on the environment defined in the config files. The execution can be rendered.
mujoco/gym_mimic_envs/mimic_env.py implements the Base class to use a MuJoCo environment in the context of imitation learning
mujoco/gym_mimic_envs/mujoco/mimic_walker3d.py is the main environment used during our experiments to train policies to generate stable human-like walking
- mujoco/gym_mimic_envs/mujoco/assets/walker3d_flat_feet.xml defines the morphology and inertial properties of the walker
scripts/mocap/ref_trajecs.py loads the post-processed mocap data from assets/ref_trajecs and prepares it for usage with an RL environment.
graphs/ contains the processed monitoring data of different policies during training that were logged to Weights & Biases.

Supplementary videos

Main video for the submission is located in the media folder
Videos of the walking gait recorded using different action spaces can be found in media/videos_action_spaces and in the following Google Drive Folder

Questions?

Please contact Rustam Galljamov in case you have any questions regarding the code.