Home

Awesome

<!-- markdownlint-disable first-line-h1 --> <!-- markdownlint-disable html --> <div align="center"> <img src="https://github.com/PKU-Alignment/omnisafe/raw/HEAD/images/logo.png" width="75%"/> </div> <div align="center">

Organization PyPI tests Documentation Status Downloads GitHub Repo Stars codestyle License CodeCov Open In Colab

</div> <p align="center"> <a href="https://omnisafe.readthedocs.io">Documentation</a> | <a href="https://github.com/PKU-Alignment/omnisafe#implemented-algorithms">Implemented Algorithms</a> | <a href="https://github.com/PKU-Alignment/omnisafe#installation">Installation</a> | <a href="https://github.com/PKU-Alignment/omnisafe#getting-started">Getting Started</a> | <a href="https://github.com/PKU-Alignment/omnisafe#license">License</a> </p>

OmniSafe is an infrastructural framework designed to accelerate safe reinforcement learning (RL) research. It provides a comprehensive and reliable benchmark for safe RL algorithms, and also an out-of-box modular toolkit for researchers. SafeRL intends to develop algorithms that minimize the risk of unintended harm or unsafe behavior.

OmniSafe stands as the inaugural unified learning framework in the realm of safe reinforcement learning, aiming to foster the Growth of SafeRL Learning Community. The key features of OmniSafe:

Train video


Table of Contents <!-- omit in toc --> <!-- markdownlint-disable heading-increment -->


Quick Start

Installation

Prerequisites

OmniSafe requires Python 3.8+ and PyTorch 1.10+.

We support and test for Python 3.8, 3.9, 3.10 on Linux. Meanwhile, we also support M1 and M2 versions of macOS. We will accept PRs related to Windows, but do not officially support it.

Install from source

# Clone the repo
git clone https://github.com/PKU-Alignment/omnisafe.git
cd omnisafe

# Create a conda environment
conda env create --file conda-recipe.yaml
conda activate omnisafe

# Install omnisafe
pip install -e .

Install from PyPI

OmniSafe is hosted in PyPI / Status.

pip install omnisafe

Implemented Algorithms

<details> <summary><b><big>Latest SafeRL Papers</big></b></summary> </details> <details> <summary><b><big>List of Algorithms</big></b></summary> <summary><b><big>On Policy SafeRL</big></b></summary> <summary><b><big>Off Policy SafeRL</big></b></summary> <summary><b><big>Model-Based SafeRL</big></b></summary> <summary><b><big>Offline SafeRL</big></b></summary> <summary><b><big>Others</big></b></summary> </details>

Examples

cd examples
python train_policy.py --algo PPOLag --env-id SafetyPointGoal1-v0 --parallel 1 --total-steps 10000000 --device cpu --vector-env-nums 1 --torch-threads 1

Algorithms Registry

<table> <thead> <tr> <th>Domains</th> <th>Types</th> <th>Algorithms Registry</th> </tr> </thead> <tbody> <tr> <td rowspan="5">On Policy</td> <td rowspan="2">Primal Dual</td> <td>TRPOLag; PPOLag; PDO; RCPO</td> </tr> <tr> <td>TRPOPID; CPPOPID</td> </tr> <tr> <td>Convex Optimization</td> <td><span style="font-weight:400;font-style:normal">CPO; PCPO; </span>FOCOPS; CUP</td> </tr> <tr> <td>Penalty Function</td> <td>IPO; P3O</td> </tr> <tr> <td>Primal</td> <td>OnCRPO</td> </tr> <tr> <td rowspan="2">Off Policy</td> <td rowspan="2">Primal-Dual</td> <td>DDPGLag; TD3Lag; SACLag</td> </tr> <tr> <td><span style="font-weight:400;font-style:normal">DDPGPID; TD3PID; SACPID</span></td> </tr> <tr> <td rowspan="2">Model-based</td> <td>Online Plan</td> <td>SafeLOOP; CCEPETS; RCEPETS</td> </tr> <tr> <td><span style="font-weight:400;font-style:normal">Pessimistic Estimate</span></td> <td>CAPPETS</td> </tr> <td rowspan="2">Offline</td> <td>Q-Learning Based</td> <td>BCQLag; C-CRR</td> </tr> <tr> <td>DICE Based</td> <td>COptDICE</td> </tr> <tr> <td rowspan="3">Other Formulation MDP</td> <td>ET-MDP</td> <td><span style="font-weight:400;font-style:normal">PPO</span>EarlyTerminated; TRPOEarlyTerminated</td> </tr> <tr> <td>SauteRL</td> <td>PPOSaute; TRPOSaute</td> </tr> <tr> <td>SimmerRL</td> <td><span style="font-weight:400;font-style:normal">PPOSimmerPID; TRPOSimmerPID</span></td> </tr> </tbody> </table>

Supported Environments

Here is a list of environments that Safety-Gymnasium supports:

<table border="1"> <thead> <tr> <th>Category</th> <th>Task</th> <th>Agent</th> <th>Example</th> </tr> </thead> <tbody> <tr> <td rowspan="4">Safe Navigation</td> <td>Goal[012]</td> <td rowspan="4">Point, Car, Racecar, Ant</td> <td rowspan="4">SafetyPointGoal1-v0</td> </tr> <tr> <td>Button[012]</td> </tr> <tr> <td>Push[012]</td> </tr> <tr> <td>Circle[012]</td> </tr> <tr> <td>Safe Velocity</td> <td>Velocity</td> <td>HalfCheetah, Hopper, Swimmer, Walker2d, Ant, Humanoid</td> <td>SafetyHumanoidVelocity-v1</td> </tr> <tr> <td rowspan="4">Safe Isaac Gym</td> <td>OverSafeFinger</td> <td rowspan="4">ShadowHand</td> <td rowspan="4">ShadowHandOverSafeFinger</td> </tr> <tr> <td>OverSafeJoint</td> </tr> <tr> <td>CatchOver2UnderarmSafeFinger</td> </tr> <tr> <td>CatchOver2UnderarmSafeJoint</td> </tr> </tbody> </table>

For more information about environments, please refer to Safety-Gymnasium.

Customizing your environment

We offer a flexible customized environment interface that allows users to achieve the following without modifying the OmniSafe source code:

We provide step-by-step tutorials on Environment Customization From Scratch and Environment Customization From Community to give you a detailed introduction on how to use this extraordinary feature of OmniSafe.

Note: If you find trouble customizing your environment, please feel free to open an issue or discussion. Pull requests are also welcomed if you're willing to contribute the implementation of your environments interface.

Try with CLI

pip install omnisafe

omnisafe --help  # Ask for help

omnisafe benchmark --help  # The benchmark also can be replaced with 'eval', 'train', 'train-config'

# Quick benchmarking for your research, just specify:
# 1. exp_name
# 2. num_pool(how much processes are concurrent)
# 3. path of the config file (refer to omnisafe/examples/benchmarks for format)

# Here we provide an exampe in ./tests/saved_source.
# And you can set your benchmark_config.yaml by following it
omnisafe benchmark test_benchmark 2 ./tests/saved_source/benchmark_config.yaml

# Quick evaluating and rendering your trained policy, just specify:
# 1. path of algorithm which you trained
omnisafe eval ./tests/saved_source/PPO-{SafetyPointGoal1-v0} --num-episode 1

# Quick training some algorithms to validate your thoughts
# Note: use `key1:key2`, your can select key of hyperparameters which are recursively contained, and use `--custom-cfgs`, you can add custom cfgs via CLI
omnisafe train --algo PPO --total-steps 2048 --vector-env-nums 1 --custom-cfgs algo_cfgs:steps_per_epoch --custom-cfgs 1024

# Quick training some algorithms via a saved config file, the format is as same as default format
omnisafe train-config ./tests/saved_source/train_config.yaml

Getting Started

Important Hints

We have provided benchmark results for various algorithms, including on-policy, off-policy, model-based, and offline approaches, along with parameter tuning analysis. Please refer to the following:

Quickstart: Colab on the Cloud

Explore OmniSafe easily and quickly through a series of Google Colab notebooks:

We take great pleasure in collaborating with our users to create tutorials in various languages. Please refer to our list of currently supported languages. If you are interested in translating the tutorial into a new language or improving an existing version, kindly submit a PR to us.


Changelog

See CHANGELOG.md.

Citing OmniSafe

If you find OmniSafe useful or use OmniSafe in your research, please cite it in your publications.

@article{omnisafe,
  title   = {OmniSafe: An Infrastructure for Accelerating Safe Reinforcement Learning Research},
  author  = {Jiaming Ji, Jiayi Zhou, Borong Zhang, Juntao Dai, Xuehai Pan, Ruiyang Sun, Weidong Huang, Yiran Geng, Mickel Liu, Yaodong Yang},
  journal = {arXiv preprint arXiv:2305.09304},
  year    = {2023}
}

Publications using OmniSafe

We have compiled a list of papers that use OmniSafe for algorithm implementation or experimentation. If you are willing to include your work in this list, or if you wish to have your implementation officially integrated into OmniSafe, please feel free to contact us.

PapersPublisher
Off-Policy Primal-Dual Safe Reinforcement LearningICLR 2024
Safe Offline Reinforcement Learning with Feasibility-Guided Diffusion ModelICLR 2024
Iterative Reachability Estimation for Safe Reinforcement LearningNeurIPS 2023
Balance Reward and Safety Optimization for Safe Reinforcement Learning: A Perspective of Gradient ManipulationAAAI 2024
Learning Safety Constraints From Demonstration Using One-Class Decision TreesAAAI 2024 WorkShops

The OmniSafe Team

OmniSafe is mainly developed by the SafeRL research team directed by Prof. Yaodong Yang. Our SafeRL research team members include Borong Zhang, Jiayi Zhou, JTao Dai, Weidong Huang, Ruiyang Sun, Xuehai Pan and Jiaming Ji. If you have any questions in the process of using OmniSafe, don't hesitate to ask your questions on the GitHub issue page, we will reply to you in 2-3 working days.

License

OmniSafe is released under Apache License 2.0.