Home

Awesome

MazeBase: a sandbox for learning from games

This code is for a simple 2D game environment that can be used in developing reinforcement learning models. It is designed to be compact but flexible, enabling the implementation of diverse set of games. Furthermore, it offers precise tuning of the game difficulty, facilitating the construction of curricula to aid training. The code is in Lua+Torch, and it offers rapid prototyping of games and is easy to connect to models that control the agent’s behavior. For more details, see our paper.

Environment

Each game is played in a 2D rectangular grid. Each location in the grid can be empty, or may contain one or more items such as:

The environment is presented to the agent as a list of sentences, each describing an item in the game. For example, an agent might see “Block at [-1,4]. Switch at [+3,0] with blue color. Info: change switch to red.” However, note that we use egocentric spatial coordinates, meaning that the environment updates the locations of each object after an action. The environments are generated randomly with some distribution on the various items. For example, we usually specify a uniform distribution over height and width, and a percentage of wall blocks and water blocks.

Games

Currently, there are 10 different games implemented, but it is possible to add new games. The existing games are:

Examples of each games are shown in this video. The internal parameters of the games are written to a configuration file, which can be easily modified.

Using Game Environment

First, either install mazebase with luarocks make *.rockspec or add the appropriate path:

package.path = package.path .. ';lua/?/init.lua'

To use the game environment as standalone in Torch, first start a local display server with

$ th -ldisplay.start 8000 0.0.0.0

which will begin the remote desktop to view the MazeBase graphics at http://0.0.0.0:8000. See the full repo for more details. Next, include the init file with

g_mazebase = require('mazebase')

Then we have to set which config file to use. Here we are using the config file that used in our paper

g_opts = {games_config_path = 'mazebase/config/game_config.lua'}

Next, we call this function to create a dictionary with all necessary words used in the game

g_mazebase.init_vocab()            

Finally, initialize the game environment with

g_mazebase.init_game()

Now we can create a new game instance by calling

g = g_mazebase.new_game()

If there are more than one game, it will randomly pick one. Now, the current game state can be retrieved by calling

s = g:to_sentence()

which would return a tensor containing words (encoded by g_vocab dictionary) describing each item in the game. If you started the display server, you can see the game at 0.0.0.0:8000 on your browser by doing

g_disp = require('display')
g_disp.image(g.map:to_image())

sample_output

Next, an action can be performed by calling

g:act(action)

where action is the index of the action. The list of possible actions are in g.agent.action_names. When there are multiple agents in the game, we can choose the agent to perform the action by doing

g.agent = g.agents[i]

before calling g:act(). After the action is completed, g:update() must be called so that the game will update its internal state. Finally, we can check if the game is finished by calling g:is_active(). You can run demo_api.lua to see the game playing with random actions.

Creating a new game

Here we demonstrate how a new game can be added. Let us create a very simple game where an agent has to reach the goal. First, we create a file named SingleGoal.lua. In it, a game class has to be created

local SingleGoal, parent = torch.class('SingleGoal', 'MazeBase')

Next, we have to construct the game items. In this case, we only need a goal item placed at a random location:

function SingleGoal:__init(opts, vocab)
    parent.__init(self, opts, vocab)
    self:add_default_items()
    self.goal = self:place_item_rand({type = 'goal'})
end

Function place_item_rand puts the item on empty random location. But it is possible specify the location using place_item function. The argument to this function is a table containing item's properties such as type and name. Here, we only set the type of item to goal, but it is possible to include any number of attributes (e.g. color, name, etc.).

The game rule is to finish when the agent reaches the goal, which can be achieved by changing update function to

function SingleGoal:update()
    parent.update(self)
    if not self.finished then
        if self.goal.loc.y == self.agent.loc.y and self.goal.loc.x == self.agent.loc.x then
            self.finished = true
        end
    end
end

This will check if the agent's location is the same as the goal, and sets a flag when it is true. Finally, we have to give a proper reward when the goal is reached:

function SingleGoal:get_reward()
    if self.finished then
        return -self.costs.goal -- this will be set in config file
    else
        return parent.get_reward(self)
    end
end

Now, we include our game file in mazebase/init.lua by adding the following line

paths.dofile('SingleGoal.lua')

Also, the following lines has to be added inside init_game_opts function:

games.SingleGoal = SingleGoal
helpers.SingleGoal = OptsHelper

Finally, we need a config file for our new game. Let us create singlegoal.lua file in mazebase/config. The main parameters of the game is the grid size:

local mapH = torch.Tensor{5,5,5,10,1}
local mapW = torch.Tensor{5,5,5,10,1}

The first two numbers define lower and upper bounds of the parameter. The actual grid size will be uniformly sampled from this range. The remaining three numbers for curriculum training. In the easiest (hardest) case, the upper bound will be set to 3rd (4th) number. 5th number is the step size for changing the upper bound. In the same way, we define a percentage of grid cells to contain a block or water:

local blockspct = torch.Tensor{0,.05,0,.2,.01}
local waterpct = torch.Tensor{0,.05,0,.2,.01}

There are other generic parameters has be set, but see the actual config file for detail. Now we are ready to use the game!

Training an agent using neural networks

We also provide a code for training different types of neural models with policy gradient method. Training uses CPUs with multi-threading for speed up. The implemented models are

  1. multi-layer neural network
  2. convolutional neural network
  3. end-to-end memory network.

For example, running the following command will train a 2-layer network on MultiGoals.

th main.lua --hidsz 20 --model mlp --nlayers 2 --epochs 100 --game MultiGoals --nactions 6 --nworker 2

To see all the command line options, run

th main.lua -h
  --hidsz             the size of the internal state vector [20]
  --nonlin            non-linearity type: tanh | relu | none [tanh]
  --model             model type: mlp | conv | memnn [memnn]
  --init_std          STD of initial weights [0.2]
  --max_attributes    maximum number of attributes of each item [6]
  --nlayers           the number of layers in MLP [2]
  --convdim           the number of feature maps in convolutional layers [20]
  --conv_sz           spatial scope of the input to convnet and MLP [19]
  --memsize           size of the memory in MemNN [20]
  --nhop              the number of hops in MemNN [3]
  --nagents           the number of agents [1]
  --nactions          the number of agent actions [11]
  --max_steps         force to end the game after this many steps [20]
  --games_config_path configuration file for games [mazebase/config/game_config.lua]
  --game              can specify a single game []
  --optim             optimization method: rmsprop | sgd [rmsprop]
  --lrate             learning rate [0.001]
  --max_grad_norm     gradient clip value [0]
  --alpha             coefficient of baseline term in the cost function [0.03]
  --epochs            the number of training epochs [100]
  --nbatches          the number of mini-batches in one epoch [100]
  --batch_size        size of mini-batch (the number of parallel games) in each thread [32]
  --nworker           the number of threads used for training [16]
  --beta              parameter of RMSProp [0.97]
  --eps               parameter of RMSProp [1e-06]
  --save              file name to save the model []
  --load              file name to load the model []

See the paper for more details on training.

Testing a trained model

After training, you can see the model playing by calling function test() which will display the game in a browser window. But you have to have display package to see the game play. For example, if you saved a trained model using --save /tmp/model.t7 option, then you can load the model using option --load /tmp/model.t7 --epochs 0 and then run test() command to see it's playing.

Requirements

The whole code is written in Lua, and requires Torch7 and nngraph packages. The training uses multi-threading for speed up. Display package is necessary for visualizing the game play, which can be installed by

luarocks install display