Home

Awesome

TimeChamber: A Massively Parallel Large Scale Self-Play Framework


TimeChamber is a large scale self-play framework running on parallel simulation. Running self-play algorithms always need lots of hardware resources, especially on 3D physically simulated environments. We provide a self-play framework that can achieve fast training and evaluation with ONLY ONE GPU. TimeChamber is developed with the following key features:

<div align=center> <img src="assets/images/algorithm.jpg" align="center" width="600"/> </div>

Installation


Download and follow the installation instructions of Isaac Gym: https://developer.nvidia.com/isaac-gym
Ensure that Isaac Gym works on your system by running one of the examples from the python/examples directory, like joint_monkey.py. If you have any trouble running the samples, please follow troubleshooting steps described in the Isaac Gym Preview Release 3/4 installation instructions.
Then install this repo:

pip install -e .

Quick Start


Tasks

Source code for tasks can be found in timechamber/tasks,The detailed settings of state/action/reward are in here. More interesting tasks will come soon.

Humanoid Strike

Humanoid Strike is a 3D environment with two simulated humanoid physics characters. Each character is equipped with a sword and shield with 37 degrees-of-freedom. The game will be restarted if one agent goes outside the arena. We measure how much the player damaged the opponent and how much the player was damaged by the opponent in the terminated step to determine the winner.

<div align=center> <img src="assets/images/humanoid_strike.gif" align="center" width="600"/> </div>

Ant Sumo

Ant Sumo is a 3D environment with simulated physics that allows pairs of ant agents to compete against each other. To win, the agent has to push the opponent out of the ring. Every agent has 100 hp . Each step, If the agent's body touches the ground, its hp will be reduced by 1.The agent whose hp becomes 0 will be eliminated.

<div align=center> <img src="assets/images/ant_sumo.gif" align="center" width="600"/> </div>

Ant Battle

Ant Battle is an expanded environment of Ant Sumo. It supports more than two agents competing against with each other. The battle ring radius will shrink, the agent going out of the ring will be eliminated.

<div align=center> <img src="assets/images/ant_battle.gif" align="center" width="600"/> </div>

Self-Play Training

To train your policy for tasks, for example:

# run self-play training for Humanoid Strike task
python train.py task=MA_Humanoid_Strike headless=True
# run self-play training for Ant Sumo task
python train.py task=MA_Ant_Sumo train=MA_Ant_SumoPPO headless=True
# run self-play training for Ant Battle task
python train.py task=MA_Ant_Battle train=MA_Ant_BattlePPO headless=True

Key arguments to the training script follow IsaacGymEnvs Configuration and command line arguments . Other training arguments follow rl_games config parameters, you can change them in timechamber/tasks/train/*.yaml. There are some specific arguments for self-play training:

Policies Evaluation

To evaluate your policies, for example:

# run testing for Ant Sumo policy
python train.py task=MA_Ant_Sumo train=MA_Ant_SumoPPO test=True num_envs=4 minibatch_size=32 headless=False checkpoint='models/ant_sumo/policy.pth'
# run testing for Humanoid Strike policy
python train.py task=MA_Humanoid_Strike train=MA_Humanoid_StrikeHRL test=True num_envs=4 minibatch_size=32 headless=False checkpoint='models/Humanoid_Strike/policy.pth' op_checkpoint='models/Humanoid_Strike/policy_op.pth'

You can set the opponent agent policy using op_checkpoint. If it's empty, the opponent agent will use the same policy as checkpoint.
We use vectorized models to accelerate the evaluation of policies. Put policies into checkpoint dir, let them compete with each other in parallel:

# run testing for Ant Sumo policy
python train.py task=MA_Ant_Sumo train=MA_Ant_SumoPPO test=True headless=True checkpoint='models/ant_sumo' player_pool_type=vectorized

There are some specific arguments for self-play evaluation, you can change them in timechamber/tasks/train/*.yaml:

<div align=center> <img src="assets/images/elo.jpg" align="center" width="400"/> </div>

Building Your Own Task

You can build your own task follow IsaacGymEnvs , make sure the obs shape is correct andinfo contains win,loseanddraw:

import isaacgym
import timechamber
import torch

envs = timechamber.make(
    seed=0,
    task="MA_Ant_Sumo",
    num_envs=2,
    sim_device="cuda:0",
    rl_device="cuda:0",
)
# the obs shape should be (num_agents*num_envs,num_obs).
# the obs of training agent is (:num_envs,num_obs)
print("Observation space is", envs.observation_space)
print("Action space is", envs.action_space)
obs = envs.reset()
for _ in range(20):
    obs, reward, done, info = envs.step(
        torch.rand((2 * 2,) + envs.action_space.shape, device="cuda:0")
    )
# info:
# {'win': tensor([Bool, Bool])
# 'lose': tensor([Bool, Bool])
# 'draw': tensor([Bool, Bool])}

Citing

If you use timechamber in your research please use the following citation:

@misc{InspirAI,
  author = {Huang Ziming, Ziyi Liu, Wu Yutong, Flood Sung},
  title = {TimeChamber: A Massively Parallel Large Scale Self-Play Framework},
  year = {2022},
  publisher = {GitHub},
  journal = {GitHub repository},
  howpublished = {\url{https://github.com/inspirai/TimeChamber}},
}