Home

Awesome

<div align="center"> <a href="https://di-engine-docs.readthedocs.io/en/latest/"><img width="1000px" height="auto" src="https://github.com/opendilab/DI-engine-docs/blob/main/source/images/head_image.png"></a> </div>

Twitter PyPI Conda Conda update PyPI - Python Version PyTorch Version

Loc Comments

Style Read en Docs Read zh_CN Docs Unittest Algotest deploy codecov

GitHub Org's stars GitHub stars GitHub forks GitHub commit activity GitHub issues GitHub pulls Contributors GitHub license Hugging Face Open in OpenXLab discord badge slack badge

<div align="center"> <a href="https://hellogithub.com/repository/175c1e13739c4e429d0abf2b32ec583d" target="_blank"> <img src="https://api.hellogithub.com/v1/widgets/recommend.svg?rid=175c1e13739c4e429d0abf2b32ec583d&claim_uid=cExIpHuMKdTQ6BW" alt="Featured|HelloGitHub" style="width: 250px; height: 54px;" width="250" height="54" /> </a> </div> <br>

Updated on 2024.06.27 DI-engine-v0.5.2

Introduction to DI-engine

Documentation | 中文文档 | Tutorials | Feature | Task & Middleware | TreeTensor | Roadmap

DI-engine is a generalized decision intelligence engine for PyTorch and JAX.

It provides python-first and asynchronous-native task and middleware abstractions, and modularly integrates several of the most important decision-making concepts: Env, Policy and Model. Based on the above mechanisms, DI-engine supports various deep reinforcement learning algorithms with superior performance, high efficiency, well-organized documentation and unittest:

DI-engine aims to standardize different Decision Intelligence environments and applications, supporting both academic research and prototype applications. Various training pipelines and customized decision AI applications are also supported:

<details open> <summary>(Click to Collapse)</summary>

On the low-level end, DI-engine comes with a set of highly re-usable modules, including RL optimization functions, PyTorch utilities and auxiliary tools.

BTW, DI-engine also has some special system optimization and design for efficient and robust large-scale RL training:

<details close> <summary>(Click for Details)</summary> </details>

Have fun with exploration and exploitation.

Outline

Installation

You can simply install DI-engine from PyPI with the following command:

pip install DI-engine

For more information about installation, you can refer to installation.

And our dockerhub repo can be found here,we prepare base image and env image with common RL environments.

<details close> <summary>(Click for Details)</summary> </details>

The detailed documentation are hosted on doc | 中文文档.

Quick Start

3 Minutes Kickoff

3 Minutes Kickoff (colab)

DI-engine Huggingface Kickoff (colab)

How to migrate a new RL Env | 如何迁移一个新的强化学习环境

How to customize the neural network model | 如何定制策略使用的神经网络模型

测试/部署 强化学习策略 的样例

新老 pipeline 的异同对比

Feature

Algorithm Versatility

<details open> <summary>(Click to Collapse)</summary>

discrete  discrete means discrete action space, which is only label in normal DRL algorithms (1-23)

continuous  means continuous action space, which is only label in normal DRL algorithms (1-23)

hybrid  means hybrid (discrete + continuous) action space (1-23)

dist  Distributed Reinforcement Learning分布式强化学习

MARL  Multi-Agent Reinforcement Learning多智能体强化学习

exp  Exploration Mechanisms in Reinforcement Learning强化学习中的探索机制

IL  Imitation Learning模仿学习

offline  Offiline Reinforcement Learning离线强化学习

mbrl  Model-Based Reinforcement Learning基于模型的强化学习

other  means other sub-direction algorithms, usually as plugin-in in the whole pipeline

P.S: The .py file in Runnable Demo can be found in dizoo

No.AlgorithmLabelDoc and ImplementationRunnable Demo
1DQNdiscreteDQN doc<br>DQN中文文档<br>policy/dqnpython3 -u cartpole_dqn_main.py / ding -m serial -c cartpole_dqn_config.py -s 0
2C51discreteC51 doc<br>policy/c51ding -m serial -c cartpole_c51_config.py -s 0
3QRDQNdiscreteQRDQN doc<br>policy/qrdqnding -m serial -c cartpole_qrdqn_config.py -s 0
4IQNdiscreteIQN doc<br>policy/iqnding -m serial -c cartpole_iqn_config.py -s 0
5FQFdiscreteFQF doc<br>policy/fqfding -m serial -c cartpole_fqf_config.py -s 0
6RainbowdiscreteRainbow doc<br>policy/rainbowding -m serial -c cartpole_rainbow_config.py -s 0
7SQLdiscretecontinuousSQL doc<br>policy/sqlding -m serial -c cartpole_sql_config.py -s 0
8R2D2distdiscreteR2D2 doc<br>policy/r2d2ding -m serial -c cartpole_r2d2_config.py -s 0
9PGdiscretePG doc<br>policy/pgding -m serial -c cartpole_pg_config.py -s 0
10PromptPGdiscretepolicy/prompt_pgding -m serial_onpolicy -c tabmwp_pg_config.py -s 0
11A2CdiscreteA2C doc<br>policy/a2cding -m serial -c cartpole_a2c_config.py -s 0
12PPO/MAPPOdiscretecontinuousMARLPPO doc<br>policy/ppopython3 -u cartpole_ppo_main.py / ding -m serial_onpolicy -c cartpole_ppo_config.py -s 0
13PPGdiscretePPG doc<br>policy/ppgpython3 -u cartpole_ppg_main.py
14ACERdiscretecontinuousACER doc<br>policy/acerding -m serial -c cartpole_acer_config.py -s 0
15IMPALAdistdiscreteIMPALA doc<br>policy/impalading -m serial -c cartpole_impala_config.py -s 0
16DDPG/PADDPGcontinuoushybridDDPG doc<br>policy/ddpgding -m serial -c pendulum_ddpg_config.py -s 0
17TD3continuoushybridTD3 doc<br>policy/td3python3 -u pendulum_td3_main.py / ding -m serial -c pendulum_td3_config.py -s 0
18D4PGcontinuousD4PG doc<br>policy/d4pgpython3 -u pendulum_d4pg_config.py
19SAC/[MASAC]discretecontinuousMARLSAC doc<br>policy/sacding -m serial -c pendulum_sac_config.py -s 0
20PDQNhybridpolicy/pdqnding -m serial -c gym_hybrid_pdqn_config.py -s 0
21MPDQNhybridpolicy/pdqnding -m serial -c gym_hybrid_mpdqn_config.py -s 0
22HPPOhybridpolicy/ppoding -m serial_onpolicy -c gym_hybrid_hppo_config.py -s 0
23BDQhybridpolicy/bdqpython3 -u hopper_bdq_config.py
24MDQNdiscretepolicy/mdqnpython3 -u asterix_mdqn_config.py
25QMIXMARLQMIX doc<br>policy/qmixding -m serial -c smac_3s5z_qmix_config.py -s 0
26COMAMARLCOMA doc<br>policy/comading -m serial -c smac_3s5z_coma_config.py -s 0
27QTranMARLpolicy/qtranding -m serial -c smac_3s5z_qtran_config.py -s 0
28WQMIXMARLWQMIX doc<br>policy/wqmixding -m serial -c smac_3s5z_wqmix_config.py -s 0
29CollaQMARLCollaQ doc<br>policy/collaqding -m serial -c smac_3s5z_collaq_config.py -s 0
30MADDPGMARLMADDPG doc<br>policy/ddpgding -m serial -c ptz_simple_spread_maddpg_config.py -s 0
31GAILILGAIL doc<br>reward_model/gailding -m serial_gail -c cartpole_dqn_gail_config.py -s 0
32SQILILSQIL doc<br>entry/sqilding -m serial_sqil -c cartpole_sqil_config.py -s 0
33DQFDILDQFD doc<br>policy/dqfdding -m serial_dqfd -c cartpole_dqfd_config.py -s 0
34R2D3ILR2D3 doc<br>R2D3中文文档<br>policy/r2d3python3 -u pong_r2d3_r2d2expert_config.py
35Guided Cost LearningILGuided Cost Learning中文文档<br>reward_model/guided_costpython3 lunarlander_gcl_config.py
36TREXILTREX doc<br>reward_model/trexpython3 mujoco_trex_main.py
37Implicit Behavorial Cloning (DFO+MCMC)ILpolicy/ibc <br> model/template/ebmpython3 d4rl_ibc_main.py -s 0 -c pen_human_ibc_mcmc_config.py
38BCOILentry/bcopython3 -u cartpole_bco_config.py
39HERexpHER doc<br>reward_model/herpython3 -u bitflip_her_dqn.py
40RNDexpRND doc<br>reward_model/rndpython3 -u cartpole_rnd_onppo_config.py
41ICMexpICM doc<br>ICM中文文档<br>reward_model/icmpython3 -u cartpole_ppo_icm_config.py
42CQLofflineCQL doc<br>policy/cqlpython3 -u d4rl_cql_main.py
43TD3BCofflineTD3BC doc<br>policy/td3_bcpython3 -u d4rl_td3_bc_main.py
44Decision Transformerofflinepolicy/dtpython3 -u d4rl_dt_mujoco.py
45EDACofflineEDAC doc<br>policy/edacpython3 -u d4rl_edac_main.py
46QGPOofflineQGPO doc<br>policy/qgpopython3 -u ding/example/qgpo.py
47MBSAC(SAC+MVE+SVG)continuousmbrlpolicy/mbpolicy/mbsacpython3 -u pendulum_mbsac_mbpo_config.py \ python3 -u pendulum_mbsac_ddppo_config.py
48STEVESAC(SAC+STEVE+SVG)continuousmbrlpolicy/mbpolicy/mbsacpython3 -u pendulum_stevesac_mbpo_config.py
49MBPOmbrlMBPO doc<br>world_model/mbpopython3 -u pendulum_sac_mbpo_config.py
50DDPPOmbrlworld_model/ddppopython3 -u pendulum_mbsac_ddppo_config.py
51DreamerV3mbrlworld_model/dreamerv3python3 -u cartpole_balance_dreamer_config.py
52PERotherworker/replay_bufferrainbow demo
53GAEotherrl_utils/gaeppo demo
54ST-DIMothertorch_utils/loss/contrastive_lossding -m serial -c cartpole_dqn_stdim_config.py -s 0
55PLRotherPLR doc<br>data/level_replay/level_samplerpython3 -u bigfish_plr_config.py -s 0
56PCGradothertorch_utils/optimizer_helper/PCGradpython3 -u multi_mnist_pcgrad_main.py -s 0
57AWRdiscretepolicy/ibcpython3 -u tabmwp_awr_config.py
</details>

Environment Versatility

<details open> <summary>(Click to Collapse)</summary>
NoEnvironmentLabelVisualizationCode and Doc Links
1Ataridiscreteoriginaldizoo link <br>env tutorial<br>环境指南
2box2d/bipedalwalkercontinuousoriginaldizoo link<br>env tutorial<br>环境指南
3box2d/lunarlanderdiscreteoriginaldizoo link<br>env tutorial<br>环境指南
4classic_control/cartpolediscreteoriginaldizoo link<br>env tutorial<br>环境指南
5classic_control/pendulumcontinuousoriginaldizoo link<br>env tutorial<br>环境指南
6competitive_rldiscrete selfplayoriginaldizoo link<br>环境指南
7gfootballdiscretesparseselfplayoriginaldizoo link<br>env tutorial<br>环境指南
8minigriddiscretesparseoriginaldizoo link<br>env tutorial<br>环境指南
9MuJoCocontinuousoriginaldizoo link<br>env tutorial<br>环境指南
10PettingZoodiscrete continuous marloriginaldizoo link<br>env tutorial<br>环境指南
11overcookeddiscrete marloriginaldizoo link<br>env tutorial
12procgendiscreteoriginaldizoo link<br>env tutorial<br>环境指南
13pybulletcontinuousoriginaldizoo link<br>环境指南
14smacdiscrete marlselfplaysparseoriginaldizoo link<br>env tutorial<br>环境指南
15d4rlofflineoridizoo link<br>环境指南
16league_demodiscrete selfplayoriginaldizoo link
17pomdp ataridiscretedizoo link
18bsuitediscreteoriginaldizoo link<br>env tutorial <br> 环境指南
19ImageNetILoriginaldizoo link<br>环境指南
20slime_volleyballdiscreteselfplayoridizoo link<br>env tutorial<br>环境指南
21gym_hybridhybridoridizoo link<br>env tutorial<br>环境指南
22GoBiggerhybridmarlselfplayoridizoo link<br>env tutorial<br>环境指南
23gym_soccerhybridoridizoo link<br>环境指南
24multiagent_mujococontinuous marloriginaldizoo link<br>环境指南
25bitflipdiscrete sparseoriginaldizoo link<br>环境指南
26sokobandiscreteGame 2dizoo link<br>env tutorial<br>环境指南
27gym_anytradingdiscreteoriginaldizoo link <br> env tutorial
28mariodiscreteoriginaldizoo link <br> env tutorial <br>环境指南
29dmc2gymcontinuousoriginaldizoo link<br>env tutorial<br>环境指南
30evogymcontinuousoriginaldizoo link <br> env tutorial <br> 环境指南
31gym-pybullet-dronescontinuousoriginaldizoo link<br>环境指南
32beergamediscreteoriginaldizoo link<br>环境指南
33classic_control/acrobotdiscreteoriginaldizoo link<br> 环境指南
34box2d/car_racingdiscrete <br> continuousoriginaldizoo link<br>环境指南
35metadrivecontinuousoriginaldizoo link<br> 环境指南
36cliffwalkingdiscreteoriginaldizoo link<br> env tutorial <br> 环境指南
37tabmwpdiscreteoriginaldizoo link <br> env tutorial <br> 环境指南
38frozen_lakediscreteoriginaldizoo link <br> env tutorial <br> 环境指南
39ising_modeldiscrete marloriginaldizoo link <br> env tutorial <br> 环境指南
40taxidiscreteoriginaldizoo link <br> env tutorial <br> 环境指南

discrete means discrete action space

continuous means continuous action space

hybrid means hybrid (discrete + continuous) action space

MARL means multi-agent RL environment

sparse means environment which is related to exploration and sparse reward

offline means offline RL environment

IL means Imitation Learning or Supervised Learning Dataset

selfplay means environment that allows agent VS agent battle

P.S. some enviroments in Atari, such as MontezumaRevenge, are also the sparse reward type.

</details>

General Data Container: TreeTensor

DI-engine utilizes TreeTensor as the basic data container in various components, which is ease of use and consistent across different code modules such as environment definition, data processing and DRL optimization. Here are some concrete code examples:

Feedback and Contribution

We appreciate all the feedbacks and contributions to improve DI-engine, both algorithms and system designs. And CONTRIBUTING.md offers some necessary information.

Supporters

↳ Stargazers

Stargazers repo roster for @opendilab/DI-engine

↳ Forkers

Forkers repo roster for @opendilab/DI-engine

Citation

@misc{ding,
    title={DI-engine: A Universal AI System/Engine for Decision Intelligence},
    author={Niu, Yazhe and Xu, Jingxin and Pu, Yuan and Nie, Yunpeng and Zhang, Jinouwen and Hu, Shuai and Zhao, Liangxuan and Zhang,  Ming and Liu, Yu},
    publisher={GitHub},
    howpublished={\url{https://github.com/opendilab/DI-engine}},
    year={2021},
}

License

DI-engine released under the Apache 2.0 license.