Home

Awesome

NeoRL

License License

This repository is the interface for the offline reinforcement learning benchmark NeoRL: A Near Real-World Benchmark for Offline Reinforcement Learning.

The NeoRL benchmark contains environments, datasets, and reward functions for training and benchmarking offline reinforcement learning algorithms. Current benchmark contains environments of CityLearn, FinRL, IB, and three Gym-MuJoCo tasks.

More about the NeoRL benchmark can be found at http://polixir.ai/research/neorl and the following paper

Rongjun Qin, Songyi Gao, Xingyuan Zhang, Zhen Xu, Shengkai Huang, Zewen Li, Weinan Zhang, Yang Yu. NeoRL: A Near Real-World Benchmark for Offline Reinforcement Learning. https://arxiv.org/abs/2102.00714

The benchmark is supported by two addtional repos, i.e. OfflineRL for training offline RL algorithms and d3pe for offline evaluation. Details for reproducing the benchmark can be found at here.

Install NeoRL interface

NeoRL interface can be installed as follows:

git clone https://agit.ai/Polixir/neorl.git
cd neorl
pip install -e .

After installation, CityLearn, Finance, and the industrial benchmark will be available. If you want to leverage MuJoCo in your tasks, it is necessary to obtain a license and follow the setup instructions, and then run:

pip install -e .[mujoco]

So far "HalfCheetah-v3", "Walker2d-v3", and "Hopper-v3" are supported within MuJoCo.

Using NeoRL

NeoRL uses the OpenAI Gym API. Tasks are created via the neorl.make function. A full list of all tasks is available here.

import neorl

# Create an environment
env = neorl.make("citylearn")
env.reset()
env.step(env.action_space.sample())

# Get 100 trajectories of low level policy collection on citylearn task
train_data, val_data = env.get_dataset(data_type = "low", train_num = 100)

To facilitate setting different goals, users can provide custom reward function to neorl.make() while creating an env. See usage and examples of neorl.make() for more details.

As a benchmark, in order to test algorithms conveniently and quickly, each task is associated with a small training dataset and a validation dataset by default. They can be obtained by env.get_dataset(). Meanwhile, for flexibility, extra parameters can be passed into get_dataset() to get multiple pairs of datasets for benchmarking. Each task collects data using a low, medium, or high level policy; for each task, we provide training data for a maximum of 10000 trajectories. See usage of get_dataset() for more details about parameter usage.

Data in NeoRL

In NeoRL, training data and validation data returned by get_dataset() function are dict with the same format:

Reference

Licenses

All datasets are licensed under the Creative Commons Attribution 4.0 License (CC BY), and code is licensed under the Apache 2.0 License.