Home

Awesome

<div align="center"> <img width="300px" height="auto" src="./docs/.images/xingtian-logo.png"> </div>

中文

Introduction

License: MIT

XingTian (刑天) is a componentized library for the development and verification of reinforcement learning algorithms. It supports multiple algorithms, including DQN, DDPG, PPO, and IMPALA etc, which could training agents in multiple environments, such as Gym, Atari, Torcs, StarCraftII and so on. To meet users' requirements for quick verification and solving RL problems, four modules are abstracted: Algorithm, Model, Agent, and Environment. They work in a similar way as the combination of `Lego' building blocks. For details about the architecture, please see the Architecture introduction.

Dependencies

# ubuntu 18.04
sudo apt-get install python3-pip libopencv-dev -y
pip3 install opencv-python

# run with tensorflow 1.15.0 or tensorflow 2.3.1
pip3 install zmq h5py gym[atari] tqdm imageio matplotlib==3.0.3 Ipython pyyaml tensorflow==1.15.0 pyarrow lz4 fabric2 absl-py psutil tensorboardX setproctitle

or, using pip3 install -r requirements.txt

If your want to used PyTorch as the backend, please install it by yourself. Ref Pytorch

Installation

# cd PATH/TO/XingTian 
pip3 install -e .

After installation, you could use import xt; print(xt.__Version__) to check whether the installation is successful.

In [1]: import xt

In [2]: xt.__version__
Out[2]: '0.3.0'

Quick Start


Setup configuration

Follow's configuration shows a minimal example with Cartpole environment. More detailed description with the parameters of agent, algorithm and environment could been find in the User guide .

alg_para:
  alg_name: PPO
  alg_config:
    process_num: 1
    save_model: True  # default False
    save_interval: 100

env_para:
  env_name: GymEnv
  env_info:
    name: CartPole-v0
    vision: False

agent_para:
  agent_name: PPO
  agent_num : 1
  agent_config:
    max_steps: 200
    complete_step: 1000000
    complete_episode: 3550

model_para:
  actor:
    model_name: PpoMlp
    state_dim: [4]
    action_dim: 2
    input_dtype: float32
    model_config:
      BATCH_SIZE: 200
      CRITIC_LOSS_COEF: 1.0
      ENTROPY_LOSS: 0.01
      LR: 0.0003
      LOSS_CLIPPING: 0.2
      MAX_GRAD_NORM: 5.0
      NUM_SGD_ITER: 8
      SUMMARY: False
      VF_SHARE_LAYERS: False
      activation: tanh
      hidden_sizes: [64, 64]

env_num: 10

In addition, your could find more configuration sets in examples directory.

Start training task

python3 xt/main.py -f examples/cartpole_ppo.yaml -t train

img

Evaluate local trained model

Set benchmark.eval.model_path for evaluation within the YOUR_CONFIG_FILE.yaml

benchmark:
  eval:
    model_path: /YOUR/PATH/TO/EVAL/models
    gap: 10           # index gap of eval model
    evaluator_num: 1  # the number of evaluator instance

# run command
python3 xt/main.py -f examples/cartpole_ppo.yaml -t evaluate

NOTE: XingTian start with -t train as default.

Run with CLI

# Could replace `python3 xt/main.py` with `xt_main` command!
xt_main -f examples/cartpole_ppo.yaml -t train

# train with evaluate
xt_main -f examples/cartpole_ppo.yaml -t train_with_evaluate

Develop with Custom case

  1. Write custom module, and register it. More detail guidance on custom module can be found in the Developer Guide
  2. Add YOUR-CUSTOM-MODULE name into your_train_configure.yaml
  3. Start training with xt_main -f path/to/your_train_configure.yaml :)

Reference Results

Episode Reward Average

  1. DQN Reward after 10M time-steps (40M frames).

    envXingTian Basic DQNRLlib Basic DQNHessel et al. DQN
    BeamRider67062869~2000
    Breakout352287~150
    QBert140873921~4000
    SpaceInvaders947650~500
  2. PPO Reward after 10M time-steps (40M frames).

    envXingTian PPORLlib PPOBaselines PPO
    BeamRider48772807~1800
    Breakout341104~250
    QBert1477111085~14000
    SpaceInvaders1025671~800
  3. IMPALA Reward after 10M time-steps (40M frames).

    envXingTian IMPALARLlib IMPALA
    BeamRider23132071
    Breakout334385
    QBert122054068
    SpaceInvaders742719

Throughput

  1. DQN

    envXingTian Basic DQNRLlib Basic DQN
    BeamRider129109
    Breakout117113
    QBert11190
    SpaceInvaders115100
  2. PPO

    envXingTian PPORLlib PPO
    BeamRider24221618
    Breakout24971535
    QBert24361617
    SpaceInvaders24381608
  3. IMPALA

    envXingTian IMPALARLlib IMPALA
    BeamRider87563637
    Breakout88143525
    QBert82493471
    SpaceInvaders84633555

Experiment condition: 72 Intel(R) Xeon(R) Gold 6154 CPU @ 3.00GHz with single Tesla V100

Ray's reward data come from https://github.com/ray-project/rl-experiments, and Throughout from ray 0.8.6 with the same machine condition.

Acknowledgement

XingTian refers to the following projects: DeepMind/scalable_agent, baselines, ray.

License

The MIT License(MIT)