Home

Awesome

Rainbow Algorithm on Slime Volleyball Environment

<img src="https://media.giphy.com/media/hrox9TOfiChCpcbMYw/giphy.gif" width="100%"></img>

Using Rainbow implementation in Chainer RL for Slime Volleyball Environment (Pixel Observation)

Train model from scratch:

python train_rainbow.py --outdir slime --seed 612 --gpu 0 --env SlimeVolleySurvivalNoFrameskip-v0

Run pre-trained for 1000 episodes:

python train_rainbow.py --env SlimeVolleyNoFrameskip-v0 --load zoo/best/ --demo --gpu -1 --eval-n-steps 300001

Final Result:

n_episodes: 1000 mean: 0.037 median: 0.0 stdev 0.9942935278978837

Run in render mode on desktop to visualize the agent:

python train_rainbow.py --env SlimeVolleyNoFrameskip-v0 --load zoo/best/ --demo --gpu -1 --render 

 

<img src="zoo/model.png" width="100%"></img>

This is a fork of the example in Chainer RL. Below is from the original README:

Rainbow

This example trains a Rainbow agent, from the following paper: Rainbow: Combining Improvements in Deep Reinforcement Learning.

Requirements

Running the Example

To run the training example:

python train_rainbow.py [options]

We have already pretrained models from this script for all the domains list in the results section. To load a pretrained model:

python train_rainbow.py --demo --load-pretrained --env BreakoutNoFrameskip-v4 --pretrained-type best --gpu -1

Useful Options

To view the full list of options, either view the code or run the example with the --help option.

Results

These results reflect ChainerRL v0.7.0.

Results Summary
Reporting ProtocolA re-evaluation of the best intermediate agent
Number of seeds1
Number of common domains52
Number of domains where paper scores higher20
Number of domains where ChainerRL scores higher30
Number of ties between paper and ChainerRL2
GameChainerRL ScoreOriginal Reported Scores
AirRaid6500.9N/A
Alien9409.19491.7
Amidar3252.75131.2
Assault15245.514198.5
Asterix353258.5428200.3
Asteroids2792.32712.8
Atlantis894708.5826659.5
BankHeist1734.81358.0
BattleZone90625.062010.0
BeamRider27959.516850.2
Berzerk26704.22545.6
Bowling67.130.0
Boxing99.899.6
Breakout340.8417.5
Carnival5530.3N/A
Centipede7718.18167.3
ChopperCommand303480.516654.0
CrazyClimber165370.0168788.5
DefenderN/A55105.0
DemonAttack110028.0111185.2
DoubleDunk-0.1-0.3
Enduro2273.82125.9
FishingDerby45.331.3
Freeway33.734.0
Frostbite10432.39590.5
Gopher76662.970354.6
Gravitar1819.51419.3
Hero12590.555887.4
IceHockey5.11.1
Jamesbond31392.0N/A
JourneyEscape0.0N/A
Kangaroo14462.514637.5
Krull7989.08741.5
KungFuMaster22820.552181.0
MontezumaRevenge4.0384.0
MsPacman6153.45380.4
NameThisGame14035.113136.0
Phoenix5169.6108528.6
Pitfall0.00.0
Pong20.920.9
Pooyan7793.1N/A
PrivateEye100.04234.0
Qbert42481.133817.5
Riverraid26114.0N/A
RoadRunner64306.062041.0
Robotank74.461.4
Seaquest4286.815898.9
Skiing-9441.0-12957.8
Solaris7902.23560.3
SpaceInvaders2838.018789.0
StarGunner181192.5127029.0
SurroundN/A9.7
Tennis-0.10.0
TimePilot25582.012926.0
Tutankham251.9241.0
UpNDown284465.6N/A
Venture1499.05.5
VideoPinball492071.8533936.5
WizardOfWor19796.517862.5
YarsRevenge80817.2102557.0
Zaxxon26827.522209.5

Evaluation Protocol

Our evaluation protocol is designed to mirror the evaluation protocol of the original paper as closely as possible, in order to offer a fair comparison of the quality of our example implementation. Specifically, the details of our evaluation (also can be found in the code) are the following:

Training times

Time statistics...

Training time (in days) across all domains
Mean12.929
Fastest Domain11.931 (Frostbite)
Slowest Domain13.974 (UpNDown)