Home

Awesome

LeanRL

LeanRL - Turbo-implementations of CleanRL scripts

LeanRL is a lightweight library consisting of single-file, pytorch-based implementations of popular Reinforcement Learning (RL) algorithms. The primary goal of this library is to inform the RL PyTorch user base of optimization tricks to cut training time by half or more.

More precisely, LeanRL is a fork of CleanRL, where hand-picked scripts have been re-written using PyTorch 2 features, mainly torch.compile and cudagraphs. The goal is to provide guidance on how to run your RL script at full speed with minimal impact on the user experience.

Key Features:

Disclaimer: This repo is a highly simplified version of CleanRL that lacks many features such as detailed logging or checkpointing - its only purpose is to provide various versions of similar training scripts to measure the plain runtime under various constraints. However, we welcome contributions that re-implement these features.

Speed-ups

There are three sources of speed-ups in the codes proposed here:

To reproduce these results in your own code base: look for calls to torch.compile and CudaGraphModule wrapper within the *_torchcompile.py scripts.

You can also look into run.sh for the exact commands we used to run the scripts.

The following table displays speed-ups obtained on a H100 equipped node with TODO cpu cores. All models were executed on GPU, simulation was done on CPU.

<table> <tr> <td><strong>Algorithm</strong> </td> <td><strong>PyTorch speed (fps) - CleanRL implementation</strong> </td> <td><strong>PyTorch speed (fps) - LeanRL implementation</strong> </td> <td><strong>PyTorch speed (fps) - compile</strong> </td> <td><strong>PyTorch speed (fps) - compile+cudagraphs</strong> </td> <td><strong>Overall speed-up</strong> </td> </tr> <td><a href="leanrl/ppo_atari_envpool_torchcompile.py">PPO (Atari)</a> </td> <td> 1022 </td> <td> 3728 </td> <td> 3841 </td> <td> 6809 </td> <td> 6.8x </td> </tr> <td><a href="leanrl/ppo_continuous_action_torchcompile.py">PPO (Continuous action)</a> </td> <td> 652 </td> <td> 683 </td> <td> 908 </td> <td> 1774 </td> <td> 2.7x </td> </tr> <td><a href="leanrl/sac_continuous_action_torchcompile.py">SAC (Continuous action)</a> </td> <td> 127 </td> <td> 130 </td> <td> 255 </td> <td> 725 </td> <td> 5.7x </td> </tr> <td><a href="leanrl/td3_continuous_action_torchcompile.py">TD3 (Continuous action)</a> </td> <td> 272 </td> <td> 247 </td> <td> 272 </td> <td> 936 </td> <td> 3.4x </td> </tr> </table>

These figures are displayed in the plots below. All runs were executed for an identical number of steps across 3 different seeds. Fluctuations in the results are due to seeding artifacts, not implementations details (which are identical across scripts).

<details> <summary>SAC (HalfCheetah-v4)</summary>

SAC.png

sac_speed.png

</details> <details> <summary>TD3 (HalfCheetah-v4)</summary>

TD3.png

td3_speed.png

</details> <details> <summary>PPO (Atari - Breakout-v5)</summary>

SAC.png

sac_speed.png

</details>

GPU utilization

Using torch.compile and cudagraphs also makes a better use of your GPU. To show this, we plot the GPU utilization throughout training for SAC. The Area Under The Curve (AUC) measures the total usage of the GPU over the course of the training loop execution. As this plot show, the combined usage of compile and cudagraphs brings the GPU utilization to its minimum value, meaning that you can train more models in a shorter time by utilizing these features together.

sac_gpu.png

Tips to accelerate your code in eager mode

There may be multiple reasons your RL code is running slower than it should. Here are some off-the-shelf tips to get a better runtime:

Get started

Unlike CleanRL, LeanRL does not currently support poetry.

Prerequisites:

Once the dependencies have been installed, run the scripts as follows

python leanrl/ppo_atari_envpool_torchcompile.py \
    --seed 1 \
    --total-timesteps 50000 \
    --compile \
    --cudagraphs

Together, the installation steps will generally look like this:

conda create -n leanrl python=3.10 -y
conda activate leanrl
python -m pip install --upgrade --pre torch --index-url https://download.pytorch.org/whl/nightly/cu124
python -m pip install -r requirements/requirements.txt
python -m pip install -r requirements/requirements-atari.txt
python -m pip install -r requirements/requirements-envpool.txt
python -m pip install -r requirements/requirements-mujoco.txt

python leanrl/ppo_atari_envpool_torchcompile.py \
    --seed 1 \
    --compile \
    --cudagraphs

Citing CleanRL

LeanRL does not have a citation yet, credentials should be given to CleanRL instead. To cite CleanRL in your work, please cite our technical paper:

@article{huang2022cleanrl,
  author  = {Shengyi Huang and Rousslan Fernand Julien Dossa and Chang Ye and Jeff Braga and Dipam Chakraborty and Kinal Mehta and Joรฃo G.M. Araรบjo},
  title   = {CleanRL: High-quality Single-file Implementations of Deep Reinforcement Learning Algorithms},
  journal = {Journal of Machine Learning Research},
  year    = {2022},
  volume  = {23},
  number  = {274},
  pages   = {1--18},
  url     = {http://jmlr.org/papers/v23/21-1342.html}
}

Acknowledgement

LeanRL is forked from CleanRL.

CleanRL is a community-powered by project and our contributors run experiments on a variety of hardware.

License

LeanRL is MIT licensed, as found in the LICENSE file.