Home

Awesome

wge

Authors: Evan Zheran Liu*, Kelvin Guu*, Panupong (Ice) Pasupat*, Tianlin Shi, Percy Liang (* equal contribution)

Source code accompanying our ICLR 2018 paper:
Reinforcement Learning on Web Interfaces using Workflow-Guided Exploration

Reproducible experiments using this code are located on our Codalab worksheet.

Purpose

The goal of this project is to train machine learning models (agents) to do things in a browser that can be specified in natural language, e.g. "Book a flight from San Francisco to New York for Dec 23rd."

Setup

General setup

Data directory setup

Demonstration directory setup

# Where $REPO_DIR is the path to the root of this Git repository.
git clone https://github.com/stanfordnlp/miniwob-plusplus-demos.git $REPO_DIR/third-party/miniwob-demos
export RL_DEMO_DIR=$REPO_DIR/third-party/miniwob-demos/

MiniWoB setup

MiniWoB versions of FormWoB

Follow the "Run a simple server" instruction in the MiniWoB setup section above.

Launching an Experiment

To train a model on a task, run:

python main.py configs/default-base.txt --task click-tab-2

If the script is working, you should see several Chrome windows pop up (operated by Selenium) and a training progress bar in the terminal.

Experiment management

All training runs are managed by the MiniWoBTrainingRuns object. For example, to get training run #141, do this:

runs = MiniWoBTrainingRuns()
run = runs[141]  # a MiniWoBTrainingRun object

A TrainingRun is responsible for constructing a model, training it, saving it and reloading it (see superclasses gtd.ml.TrainingRun and gtd.ml.TorchTrainingRun for details.)

The most important methods on MiniWobTrainingRun are:

Model architecture

During training, there are several key systems involved:

Environment

All environments implement the Environment interface. A policy interacts with the environment by calling the environment's step method and passing in actions.

Note that an environment object is batched. It actually represents a batch of environments, each running in parallel (so that we can train faster).

We mostly use MiniWoBEnvironment and FormWoBEnvironment.

Policies

See the Policy interface. The most important methods are act, update_from_episodes and update_from_replay_buffer.

Note that all of these methods are also batched (i.e. they operate on multiple episodes in parallel)

The model policy is the main one that we are trying to train. See MiniWoBPolicy as an example.

Episode generators

See the EpisodeGenerator interface. An EpisodeGenerator runs a Policy on an Environment to produce an Episode.

Replay buffer

See the ReplayBuffer interface. A ReplayBuffer stores episodes produced by the exploration policy. The final model policy is trained off episodes sampled from the replay buffer.

Configuration

All configs are in the configs folder. They are specified in HOCON format. The arguments to main.py should be a list of paths to config files. main.py then merges these config files according to the rules explained here.