Home

Awesome

minimal-stable-PPO

A minimal and stable Proximal Policy Optimization (PPO), tested on IsaacGymEnvs.

Requirements

Training on IsaacGymEnvs

Following instructions here to install Isaac Gym and the IsaacGymEnvs repo.

Optional instructions for cleaner code and dependencies:

First example

To train a policy on Cartpole, run

python train.py task=Cartpole

Cartpole should converge to optimal within a few seconds of starting.

In configs directory, we provide the main config file and template configs for Cartpole and AllegroHand tasks. We use Hydra for config management following IsaacGymEnvs.

Custom tasks

To train on additional tasks, follow the template configs to define [new_task].yaml under configs/task and [new_task]PPO.yaml under configs/train.

Results

Logging on TensorBoard and WandB are supported by default.

Our PPO results match IsaacGymEnvs' default RL implementation, in terms of both training speed and performance.

Cartpole in 40 seconds

<img src="imgs/Cartpole.png" width="50%" height="50%">

AllegroHand in 3 hours

<img src="imgs/AllegroHand.png" width="50%" height="50%">

Key arguments and parameters

Main config (config.yaml)

RL config (train/[task_name]PPO.yaml)

The main configs to experiment with are:

We recommend the default value for other configs, but of course, RL is RL :)

Here are some helpful guides to tuning PPO hyperparameters:

The 37 Implementation Details of Proximal Policy Optimization

Engstrom L, Ilyas A, Santurkar S, Tsipras D, Janoos F, Rudolph L, Madry A. Implementation matters in deep policy gradients: A case study on ppo and trpo. International Conference on Learning Representations, 2020

Andrychowicz M, Raichuk A, Stańczyk P, Orsini M, Girgin S, Marinier R, Hussenot L, Geist M, Pietquin O, Michalski M, Gelly S. What matters in on-policy reinforcement learning? a large-scale empirical study. International Conference on Learning Representations, 2021

Duan Y, Chen X, Houthooft R, Schulman J, Abbeel P. Benchmarking deep reinforcement learning for continuous control. In International conference on machine learning 2016 Jun 11 (pp. 1329-1338). PMLR.

I also documented a few general takeaways in this tweet.

Wait, doesn't IsaacGymEnvs already provide RL training scripts?

Yes, rl_games has great performance but could be hard to use.

If all you're looking for is a simple, clean, performant PPO that is easy to modify and extend, try this repo :))) And feel free to give feedback to make this better!

Citation

Please use the following bibtex if you find this repo helpful and would like to cite:

@misc{minimal-stable-PPO,
  author = {Lin, Toru},
  title = {A minimal and stable PPO},
  year = {2023},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/ToruOwO/minimal-stable-PPO}},
}

Acknowledgement

Shout-out to hora and rl_games, which this code implementation referenced!