Home

Awesome

CI codecov

LocalApproximationValueIteration

This package implements the Local Approximation Value Iteration algorithm in Julia for solving Markov Decision Processes (MDPs). Algorithmically it is very similar to the DiscreteValueIteration.jl package, but it represents the state space in a fundamentally different manner, as explained below. As with DiscreteValueIteration, the user should define the problem according to the API in POMDPs.jl. Examples of problem definitions can be found in POMDPModels.jl.

Installation

You need to have POMDPs.jl already and the registry added (see its README). Thereafter, you can add LocalApproximationValueIteration from package manager mode in the Julia REPL

using Pkg
Pkg.add("LocalApproximationValueIteration")

How it Works

This solver is one example of Approximate Dynamic Programming, which tries to find approximately optimal value functions and policies for large or continuous state spaces. The approach of Local Approximation Value Iteration assumes that states that are near each other (by some appropriate distance metric) will have similar values, and it computes the values at some (user-defined) finite set of states, and interpolates the value function over the entire state space using some local function approximation technique. Details of this approach are described in Section 4.5.1 of the book Decision Making Under Uncertainty : Theory and Application.

State Space Representation

For value function approximation, the solver depends on the LocalFunctionApproximation.jl package. The LocalApproximationValueIteration solver must be initialized with an appropriate LocalFunctionApproximator object that approximates the computed value function over the entire state space by either interpolation over a multi-dimensional grid discretization of the state space, or by k-nearest-neighbor averaging with a randomly drawn set of state space samples. The resulting policy uses this object to compute the action-value function or the best action for any arbitrary state query.

A key operational requirement that the solver has from the MDP is that any state can be represented via an equivalent real-valued vector. This is enforced by the two convert_s function requirements that convert an instance of the MDP State type to a real-valued vector and vice versa.

The user is required to implement the above two functions for the State type of their MDP problem model.

Usage

POMDPLinter.jl has a macro @show_requirements that determines the functions necessary to use some solver on some specific MDP model. As mentioned above, the LocalApproximationValueIteration solver depends on a LocalFunctionApproximator object and so that object must first be created to invoke the requirements of the solver accordingly. From our running example in test/runtests_versus_discrete_vi.jl, a function approximation object that uses grid interpolation (LocalGIFunctionApproximator) is created, after the appropriate RectangleGrid is constructed (Look at GridInterpolations.jl for more details about this).

using POMDPs, POMDPModels
using GridInterpolations
using LocalFunctionApproximation
using LocalApproximationValueIteration

VERTICES_PER_AXIS = 10 # Controls the resolutions along the grid axis
grid = RectangleGrid(
    range(1, 100, length=VERTICES_PER_AXIS), # x
    range(1, 100, length=VERTICES_PER_AXIS)  # y
)
interp = LocalGIFunctionApproximator(grid)

The user should modify the above steps depending on the kind of interpolation and the necessary parameters they want. We have delegated this step to the user as it is extremely problem and domain specific. Note that the solver supports both explicit and generative transition models for the MDP (more on that here). The is_mdp_generative and n_generative_samples arguments of the LocalApproximationValueIteration solver should be set accordingly, and there are different @requirements depending on which kind of model the MDP has.

Once all the necessary functions have been defined, the solver can be created. A GridWorld MDP is defined with grid size 100 x 100 and appropriate reward states:

mdp = SimpleGridWorld(
    size = (100,100),
    rewards = Dict(GWPos(x,y)=>10. for x ∈ 40:60, y ∈ 40:60)
)

Finally, the solver can be created using the function approximation object and other necessary parameters (this model is explicit), and the MDP can be solved:

approx_solver = LocalApproximationValueIterationSolver(interp, verbose=true, max_iterations=1000, is_mdp_generative=false)
approx_policy = solve(approx_solver, mdp)

The API for querying the final policy object is identical to DiscreteValueIteration, i.e. the action and value functions can be called for the solved MDP:

v = value(approx_policy, s)  # returns the approximately optimal value for state s
a = action(approx_policy, s) # returns the approximately optimal action for state s