Home

Awesome

Unit-tests Documentation Benchmarks codecov Twitter Follow Python version GitHub license <a href="https://pypi.org/project/torchrl"><img src="https://img.shields.io/pypi/v/torchrl" alt="pypi version"></a> <a href="https://pypi.org/project/torchrl-nightly"><img src="https://img.shields.io/pypi/v/torchrl-nightly?label=nightly" alt="pypi nightly version"></a> Downloads Downloads Discord Shield

TorchRL

<p align="center"> <img src="docs/source/_static/img/icon.png" width="200" > </p>

Documentation | TensorDict | Features | Examples, tutorials and demos | Citation | Installation | Asking a question | Contributing

TorchRL is an open-source Reinforcement Learning (RL) library for PyTorch.

Key features

Design Principles

Read the full paper for a more curated description of the library.

Getting started

Check our Getting Started tutorials for quickly ramp up with the basic features of the library!

<p align="center"> <img src="docs/ppo.png" width="800" > </p>

Documentation and knowledge base

The TorchRL documentation can be found here. It contains tutorials and the API reference.

TorchRL also provides a RL knowledge base to help you debug your code, or simply learn the basics of RL. Check it out here.

We have some introductory videos for you to get to know the library better, check them out:

Spotlight publications

TorchRL being domain-agnostic, you can use it across many different fields. Here are a few examples:

Writing simplified and portable RL codebase with TensorDict

RL algorithms are very heterogeneous, and it can be hard to recycle a codebase across settings (e.g. from online to offline, from state-based to pixel-based learning). TorchRL solves this problem through TensorDict, a convenient data structure<sup>(1)</sup> that can be used to streamline one's RL codebase. With this tool, one can write a complete PPO training script in less than 100 lines of code!

<details> <summary>Code</summary>
import torch
from tensordict.nn import TensorDictModule
from tensordict.nn.distributions import NormalParamExtractor
from torch import nn

from torchrl.collectors import SyncDataCollector
from torchrl.data.replay_buffers import TensorDictReplayBuffer, \
  LazyTensorStorage, SamplerWithoutReplacement
from torchrl.envs.libs.gym import GymEnv
from torchrl.modules import ProbabilisticActor, ValueOperator, TanhNormal
from torchrl.objectives import ClipPPOLoss
from torchrl.objectives.value import GAE

env = GymEnv("Pendulum-v1") 
model = TensorDictModule(
  nn.Sequential(
      nn.Linear(3, 128), nn.Tanh(),
      nn.Linear(128, 128), nn.Tanh(),
      nn.Linear(128, 128), nn.Tanh(),
      nn.Linear(128, 2),
      NormalParamExtractor()
  ),
  in_keys=["observation"],
  out_keys=["loc", "scale"]
)
critic = ValueOperator(
  nn.Sequential(
      nn.Linear(3, 128), nn.Tanh(),
      nn.Linear(128, 128), nn.Tanh(),
      nn.Linear(128, 128), nn.Tanh(),
      nn.Linear(128, 1),
  ),
  in_keys=["observation"],
)
actor = ProbabilisticActor(
  model,
  in_keys=["loc", "scale"],
  distribution_class=TanhNormal,
  distribution_kwargs={"low": -1.0, "high": 1.0},
  return_log_prob=True
  )
buffer = TensorDictReplayBuffer(
  storage=LazyTensorStorage(1000),
  sampler=SamplerWithoutReplacement(),
  batch_size=50,
  )
collector = SyncDataCollector(
  env,
  actor,
  frames_per_batch=1000,
  total_frames=1_000_000,
)
loss_fn = ClipPPOLoss(actor, critic)
adv_fn = GAE(value_network=critic, average_gae=True, gamma=0.99, lmbda=0.95)
optim = torch.optim.Adam(loss_fn.parameters(), lr=2e-4)

for data in collector:  # collect data
  for epoch in range(10):
      adv_fn(data)  # compute advantage
      buffer.extend(data)
      for sample in buffer:  # consume data
          loss_vals = loss_fn(sample)
          loss_val = sum(
              value for key, value in loss_vals.items() if
              key.startswith("loss")
              )
          loss_val.backward()
          optim.step()
          optim.zero_grad()
  print(f"avg reward: {data['next', 'reward'].mean().item(): 4.4f}")
</details>

Here is an example of how the environment API relies on tensordict to carry data from one function to another during a rollout execution: Alt Text

TensorDict makes it easy to re-use pieces of code across environments, models and algorithms.

<details> <summary>Code</summary>

For instance, here's how to code a rollout in TorchRL:

- obs, done = env.reset()
+ tensordict = env.reset()
policy = SafeModule(
    model,
    in_keys=["observation_pixels", "observation_vector"],
    out_keys=["action"],
)
out = []
for i in range(n_steps):
-     action, log_prob = policy(obs)
-     next_obs, reward, done, info = env.step(action)
-     out.append((obs, next_obs, action, log_prob, reward, done))
-     obs = next_obs
+     tensordict = policy(tensordict)
+     tensordict = env.step(tensordict)
+     out.append(tensordict)
+     tensordict = step_mdp(tensordict)  # renames next_observation_* keys to observation_*
- obs, next_obs, action, log_prob, reward, done = [torch.stack(vals, 0) for vals in zip(*out)]
+ out = torch.stack(out, 0)  # TensorDict supports multiple tensor operations
</details>

Using this, TorchRL abstracts away the input / output signatures of the modules, env, collectors, replay buffers and losses of the library, allowing all primitives to be easily recycled across settings.

<details> <summary>Code</summary>

Here's another example of an off-policy training loop in TorchRL (assuming that a data collector, a replay buffer, a loss and an optimizer have been instantiated):

- for i, (obs, next_obs, action, hidden_state, reward, done) in enumerate(collector):
+ for i, tensordict in enumerate(collector):
-     replay_buffer.add((obs, next_obs, action, log_prob, reward, done))
+     replay_buffer.add(tensordict)
    for j in range(num_optim_steps):
-         obs, next_obs, action, hidden_state, reward, done = replay_buffer.sample(batch_size)
-         loss = loss_fn(obs, next_obs, action, hidden_state, reward, done)
+         tensordict = replay_buffer.sample(batch_size)
+         loss = loss_fn(tensordict)
        loss.backward()
        optim.step()
        optim.zero_grad()

This training loop can be re-used across algorithms as it makes a minimal number of assumptions about the structure of the data.

</details>

TensorDict supports multiple tensor operations on its device and shape (the shape of TensorDict, or its batch size, is the common arbitrary N first dimensions of all its contained tensors):

<details> <summary>Code</summary>
# stack and cat
tensordict = torch.stack(list_of_tensordicts, 0)
tensordict = torch.cat(list_of_tensordicts, 0)
# reshape
tensordict = tensordict.view(-1)
tensordict = tensordict.permute(0, 2, 1)
tensordict = tensordict.unsqueeze(-1)
tensordict = tensordict.squeeze(-1)
# indexing
tensordict = tensordict[:2]
tensordict[:, 2] = sub_tensordict
# device and memory location
tensordict.cuda()
tensordict.to("cuda:1")
tensordict.share_memory_()
</details>

TensorDict comes with a dedicated tensordict.nn module that contains everything you might need to write your model with it. And it is functorch and torch.compile compatible!

<details> <summary>Code</summary>
transformer_model = nn.Transformer(nhead=16, num_encoder_layers=12)
+ td_module = SafeModule(transformer_model, in_keys=["src", "tgt"], out_keys=["out"])
src = torch.rand((10, 32, 512))
tgt = torch.rand((20, 32, 512))
+ tensordict = TensorDict({"src": src, "tgt": tgt}, batch_size=[20, 32])
- out = transformer_model(src, tgt)
+ td_module(tensordict)
+ out = tensordict["out"]

The TensorDictSequential class allows to branch sequences of nn.Module instances in a highly modular way. For instance, here is an implementation of a transformer using the encoder and decoder blocks:

encoder_module = TransformerEncoder(...)
encoder = TensorDictSequential(encoder_module, in_keys=["src", "src_mask"], out_keys=["memory"])
decoder_module = TransformerDecoder(...)
decoder = TensorDictModule(decoder_module, in_keys=["tgt", "memory"], out_keys=["output"])
transformer = TensorDictSequential(encoder, decoder)
assert transformer.in_keys == ["src", "src_mask", "tgt"]
assert transformer.out_keys == ["memory", "output"]

TensorDictSequential allows to isolate subgraphs by querying a set of desired input / output keys:

transformer.select_subsequence(out_keys=["memory"])  # returns the encoder
transformer.select_subsequence(in_keys=["tgt", "memory"])  # returns the decoder
</details>

Check TensorDict tutorials to learn more!

Features

If you feel a feature is missing from the library, please submit an issue! If you would like to contribute to new features, check our call for contributions and our contribution page.

Examples, tutorials and demos

A series of State-of-the-Art implementations are provided with an illustrative purpose:

<table> <tr> <td><strong>Algorithm</strong> </td> <td><strong>Compile Support**</strong> </td> <td><strong>Tensordict-free API</strong> </td> <td><strong>Modular Losses</strong> </td> <td><strong>Continuous and Discrete</strong> </td> </tr> <tr> <td><a href="https://github.com/pytorch/rl/blob/main/sota-implementations/dqn">DQN</a> </td> <td> 1.9x </td> <td> + </td> <td> NA </td> <td> + (through <a href="https://pytorch.org/rl/stable/reference/generated/torchrl.envs.transforms.ActionDiscretizer.html?highlight=actiondiscretizer">ActionDiscretizer</a> transform) </td> </tr> <tr> <td><a href="https://github.com/pytorch/rl/blob/main/sota-implementations/ddpg/ddpg.py">DDPG</a> </td> <td> 1.87x </td> <td> + </td> <td> + </td> <td> - (continuous only) </td> </tr> <tr> <td><a href="https://github.com/pytorch/rl/blob/main/sota-implementations/iql/">IQL</a> </td> <td> 3.22x </td> <td> + </td> <td> + </td> <td> + </td> </tr> <tr> <td><a href="https://github.com/pytorch/rl/blob/main/sota-implementations/cql/cql_offline.py">CQL</a> </td> <td> 2.68x </td> <td> + </td> <td> + </td> <td> + </td> </tr> <tr> <td><a href="https://github.com/pytorch/rl/blob/main/sota-implementations/td3/td3.py">TD3</a> </td> <td> 2.27x </td> <td> + </td> <td> + </td> <td> - (continuous only) </td> </tr> <tr> <td> <a href="https://github.com/pytorch/rl/blob/main/sota-implementations/td3_bc/td3_bc.py">TD3+BC</a> </td> <td> untested </td> <td> + </td> <td> + </td> <td> - (continuous only) </td> </tr> <tr> <td> <a href="https://github.com/pytorch/rl/blob/main/examples/a2c/">A2C</a> </td> <td> 2.67x </td> <td> + </td> <td> - </td> <td> + </td> </tr> <tr> <td> <a href="https://github.com/pytorch/rl/blob/main/sota-implementations/ppo/">PPO</a> </td> <td> 2.42x </td> <td> + </td> <td> - </td> <td> + </td> </tr> <tr> <td><a href="https://github.com/pytorch/rl/blob/main/sota-implementations/sac/sac.py">SAC</a> </td> <td> 2.62x </td> <td> + </td> <td> - </td> <td> + </td> </tr> <tr> <td><a href="https://github.com/pytorch/rl/blob/main/sota-implementations/redq/redq.py">REDQ</a> </td> <td> 2.28x </td> <td> + </td> <td> - </td> <td> - (continuous only) </td> </tr> <tr> <td><a href="https://github.com/pytorch/rl/blob/main/sota-implementations/dreamer/dreamer.py">Dreamer v1</a> </td> <td> untested </td> <td> + </td> <td> + (<a href="https://pytorch.org/rl/stable/reference/objectives.html#dreamer">different classes</a>) </td> <td> - (continuous only) </td> </tr> <tr> <td><a href="https://github.com/pytorch/rl/blob/main/sota-implementations/decision_transformer">Decision Transformers</a> </td> <td> untested </td> <td> + </td> <td> NA </td> <td> - (continuous only) </td> </tr> <tr> <td><a href="https://github.com/pytorch/rl/blob/main/sota-implementations/crossq">CrossQ</a> </td> <td> untested </td> <td> + </td> <td> + </td> <td> - (continuous only) </td> </tr> <tr> <td><a href="https://github.com/pytorch/rl/blob/main/sota-implementations/gail">Gail</a> </td> <td> untested </td> <td> + </td> <td> NA </td> <td> + </td> </tr> <tr> <td><a href="https://github.com/pytorch/rl/blob/main/sota-implementations/impala">Impala</a> </td> <td> untested </td> <td> + </td> <td> - </td> <td> + </td> </tr> <tr> <td><a href="https://github.com/pytorch/rl/blob/main/sota-implementations/multiagent/iql.py">IQL (MARL)</a> </td> <td> untested </td> <td> + </td> <td> + </td> <td> + </td> </tr> <tr> <td><a href="https://github.com/pytorch/rl/blob/main/sota-implementations/multiagent/maddpg_iddpg.py">DDPG (MARL)</a> </td> <td> untested </td> <td> + </td> <td> + </td> <td> - (continuous only) </td> </tr> <tr> <td><a href="https://github.com/pytorch/rl/blob/main/sota-implementations/multiagent/mappo_ippo.py">PPO (MARL)</a> </td> <td> untested </td> <td> + </td> <td> - </td> <td> + </td> </tr> <tr> <td><a href="https://github.com/pytorch/rl/blob/main/sota-implementations/multiagent/qmix_vdn.py">QMIX-VDN (MARL)</a> </td> <td> untested </td> <td> + </td> <td> NA </td> <td> + </td> </tr> <tr> <td><a href="https://github.com/pytorch/rl/blob/main/sota-implementations/multiagent/sac.py">SAC (MARL)</a> </td> <td> untested </td> <td> + </td> <td> - </td> <td> + </td> </tr> <tr> <td><a href="https://github.com/pytorch/rl/blob/main/examples/rlhf">RLHF</a> </td> <td> NA </td> <td> + </td> <td> NA </td> <td> NA </td> </tr> </table>

** The number indicates expected speed-up compared to eager mode when executed on CPU. Numbers may vary depending on architecture and device.

and many more to come!

Code examples displaying toy code snippets and training scripts are also available

Check the examples directory for more details about handling the various configuration settings.

We also provide tutorials and demos that give a sense of what the library can do.

Citation

If you're using TorchRL, please refer to this BibTeX entry to cite this work:

@misc{bou2023torchrl,
      title={TorchRL: A data-driven decision-making library for PyTorch}, 
      author={Albert Bou and Matteo Bettini and Sebastian Dittert and Vikash Kumar and Shagun Sodhani and Xiaomeng Yang and Gianni De Fabritiis and Vincent Moens},
      year={2023},
      eprint={2306.00577},
      archivePrefix={arXiv},
      primaryClass={cs.LG}
}

Installation

Create a conda environment where the packages will be installed.

conda create --name torch_rl python=3.9
conda activate torch_rl

PyTorch

Depending on the use of functorch that you want to make, you may want to install the latest (nightly) PyTorch release or the latest stable version of PyTorch. See here for a detailed list of commands, including pip3 or other special installation instructions.

Torchrl

You can install the latest stable release by using

pip3 install torchrl

This should work on linux, Windows 10 and OsX (Intel or Silicon chips). On certain Windows machines (Windows 11), one should install the library locally (see below).

The nightly build can be installed via

pip3 install torchrl-nightly

which we currently only ship for Linux and OsX (Intel) machines. Importantly, the nightly builds require the nightly builds of PyTorch too.

To install extra dependencies, call

pip3 install "torchrl[atari,dm_control,gym_continuous,rendering,tests,utils,marl,open_spiel,checkpointing]"

or a subset of these.

One may also desire to install the library locally. Three main reasons can motivate this:

To install the library locally, start by cloning the repo:

git clone https://github.com/pytorch/rl

and don't forget to check out the branch or tag you want to use for the build:

git checkout v0.4.0

Go to the directory where you have cloned the torchrl repo and install it (after installing ninja)

cd /path/to/torchrl/
pip3 install ninja -U
python setup.py develop

One can also build the wheels to distribute to co-workers using

python setup.py bdist_wheel

Your wheels will be stored there ./dist/torchrl<name>.whl and installable via

pip install torchrl<name>.whl

Warning: Unfortunately, pip3 install -e . does not currently work. Contributions to help fix this are welcome!

On M1 machines, this should work out-of-the-box with the nightly build of PyTorch. If the generation of this artifact in MacOs M1 doesn't work correctly or in the execution the message (mach-o file, but is an incompatible architecture (have 'x86_64', need 'arm64e')) appears, then try

ARCHFLAGS="-arch arm64" python setup.py develop

To run a quick sanity check, leave that directory (e.g. by executing cd ~/) and try to import the library.

python -c "import torchrl"

This should not return any warning or error.

Optional dependencies

The following libraries can be installed depending on the usage one wants to make of torchrl:

# diverse
pip3 install tqdm tensorboard "hydra-core>=1.1" hydra-submitit-launcher

# rendering
pip3 install moviepy

# deepmind control suite
pip3 install dm_control

# gym, atari games
pip3 install "gym[atari]" "gym[accept-rom-license]" pygame

# tests
pip3 install pytest pyyaml pytest-instafail

# tensorboard
pip3 install tensorboard

# wandb
pip3 install wandb

Troubleshooting

If a ModuleNotFoundError: No module named ‘torchrl._torchrl errors occurs (or a warning indicating that the C++ binaries could not be loaded), it means that the C++ extensions were not installed or not found.

Versioning issues can cause error message of the type undefined symbol and such. For these, refer to the versioning issues document for a complete explanation and proposed workarounds.

Asking a question

If you spot a bug in the library, please raise an issue in this repo.

If you have a more generic question regarding RL in PyTorch, post it on the PyTorch forum.

Contributing

Internal collaborations to torchrl are welcome! Feel free to fork, submit issues and PRs. You can checkout the detailed contribution guide here. As mentioned above, a list of open contributions can be found in here.

Contributors are recommended to install pre-commit hooks (using pre-commit install). pre-commit will check for linting related issues when the code is committed locally. You can disable th check by appending -n to your commit command: git commit -m <commit message> -n

Disclaimer

This library is released as a PyTorch beta feature. BC-breaking changes are likely to happen but they will be introduced with a deprecation warranty after a few release cycles.

License

TorchRL is licensed under the MIT License. See LICENSE for details.