Awesome
Improving Sample Efficiency of Deep Reinforcement Learning for Bipedal Walking
Abstract
Reinforcement learning holds a great promise of enabling bipedal walking in humanoid robots. However, despite encouraging recent results, training still requires significant amounts of time and resources, precluding fast iteration cycles of the control development. Therefore, faster training methods are needed. In this paper, we investigate a number of techniques for improving sample efficiency of on-policy actor-critic algorithms and show that a significant reduction in training time is achievable with a few straightforward modifications of the common algorithms, such as PPO and DeepMimic, tailored specifically towards the problem of bipedal walking. Action space representation, symmetry prior induction, and cliprange scheduling proved effective at reducing sample complexity by a factor of 4.5. These results indicate that domain-specific knowledge can be readily utilized to reduce training times and thereby enable faster development cycles in challenging robotic applications.
Installation
- Install CUDA 10.1 following this medium post.
- Follow these instructions to install anaconda.
- Create a conda environment from the .yml file located in the repository with
conda env create -f path/to/conda_env.yml
- Install MuJoCo and mujoco-py following these instructions.
Main scripts, files and folders
scripts/config_light.py
specifies the simulation environment, as well as main hyperparameters and main experimental/training settingsscripts/common/config.py
allows detailed control over all hyperparameters and experimental/training settingsscripts/train.py
trains a policy on the specified environmentscripts/run.py
loads a policy from a specified path and executes it on the environment defined in the config files. The execution can be rendered.mujoco/gym_mimic_envs/mimic_env.py
implements the Base class to use a MuJoCo environment in the context of imitation learningmujoco/gym_mimic_envs/mujoco/mimic_walker3d.py
is the main environment used during our experiments to train policies to generate stable human-like walkingmujoco/gym_mimic_envs/mujoco/assets/walker3d_flat_feet.xml
defines the morphology and inertial properties of the walker
scripts/mocap/ref_trajecs.py
loads the post-processed mocap data fromassets/ref_trajecs
and prepares it for usage with an RL environment.graphs/
contains the processed monitoring data of different policies during training that were logged to Weights & Biases.
Supplementary videos
- Main video for the submission is located in the
media
folder - Videos of the walking gait recorded using different action spaces can be found in
media/videos_action_spaces
and in the following Google Drive Folder
Questions?
Please contact Rustam Galljamov in case you have any questions regarding the code.