Home

Awesome

Real-World Reinforcement Learning (RWRL) Challenge Framework

<p align="center"> <img src="docs/img/angular_velocity.gif" height="150px"/><img src="docs/img/humanoid_perturbations.gif" height="150px"> </p>

The "Challenges of Real-World RL" paper identifies and describes a set of nine challenges that are currently preventing Reinforcement Learning (RL) agents from being utilized on real-world applications and products. It also describes an evaluation framework and a set of environments that can provide an evaluation of an RL algorithm’s potential applicability to real-world systems. It has since then been followed up with "An Empirical Investigation of the challenges of real-world reinforcement learning" which implements eight of the nine described challenges (excluding explainability) and analyses their effects on various state-of-the-art RL algorithms. This is the codebase used to perform this analysis, and is also intended as a common platform for easily reproducible experimentation around these challenges, it is referred to as the realworldrl-suite (Real-World Reinforcement Learning (RWRL) Suite).

Currently the suite is to comprised of five environments:

The codebase is currently structured as:

Questions can be directed to the Real-World RL group e-mail [realworldrl@google.com].

:information_source: If you wish to test your agent in a principled fashion on related challenges in low-dimensional domains, we highly recommend using bsuite.

Documentation

We overview the challenges here, but more thorough documentation on how to configure each challenge can be found here.

Starter examples are presented in the examples section.

Challenges

Safety

Adds a set of constraints on the task. Returns an additional entry in the observations ('constraints') in the length of the number of the contraints, where each entry is True if the constraint is satisfied and False otherwise.

Delays

Action, observation and reward delays.

Noise

Action and observation noise. Different noise include:

The noise specifications can be parameterized in the noise_spec dictionary.

Perturbations

Perturbs physical quantities of the environment. These perturbations are non-stationary and are governed by a scheduler.

Dimensionality

Adds extra dummy features to observations to increase dimensionality of the state space.

Multi-Objective Rewards:

Adds additional objectives and specifies objectives interaction (e.g., sum).

Offline Learning

We provide our offline datasets through the RL Unplugged library. There is an example and an associated colab.

RWRL Combined Challenge Benchmarks:

Combines multiple challenges into the same environment. The challenges are divided into 'Easy', 'Medium' and 'Hard' which depend on the magnitude of the challenge effects applied along each challenge dimension.

Installation

Running examples

We provide three example agents: a random agent, a PPO agent, and an ACME-based DMPO agent.

RWRL Combined Challenge Benchmark Instantiation:

As mentioned above, these benchmark challenges are divided into 'Easy', 'Medium' and 'Hard' difficulty levels. For the current state-of-the-art performance on these benchmarks, please see <a href="https://arxiv.org/abs/2003.11881">this</a> paper.

Instantiating a combined challenge environment with 'Easy' difficulty is done as follows:

import realworldrl_suite.environments as rwrl
env = rwrl.load(
    domain_name='cartpole',
    task_name='realworld_swingup',
    combined_challenge='easy',
    log_output='/tmp/path/to/results.npz',
    environment_kwargs=dict(log_safety_vars=True, flat_observation=True))

Acknowledgements

If you use realworldrl_suite in your work, please cite:

  @article{dulacarnold2020realworldrlempirical,
           title={An empirical investigation of the challenges of real-world reinforcement learning},
           author={Dulac-Arnold, Gabriel and
                   Levine, Nir and
                   Mankowitz, Daniel J. and
                   Li, Jerry and
                   Paduraru, Cosmin and
                   Gowal, Sven and
                   Hester, Todd
                   },
           year={2020},
  }

Paper links