Home

Awesome

statebox - state "monad" for automated conflict resolution

bob@redivi.com

Overview:

statebox is a data structure you can use with an eventually consistent system such as riak to resolve conflicts between siblings in a deterministic manner.

Status:

Used in production at Mochi Media for multiple backend services.

Theory:

A statebox wraps a current value and an event queue. The event queue is an ordered list of {timestamp(), op()}. When two or more statebox are merged with statebox:merge/1, the event queues are merged with lists:umerge/1 and the operations are performed again over the current value of the newest statebox, producing a new statebox with conflicts resolved in a deterministic manner.

An op() is a {fun(), [term()]}, with all but the last argument specified in the term list. For example {ordsets:add_element/2, [a]}. To evaluate this op, ordsets:add_element(a, value(Statebox)) will be called. It is also possible to specify an op() as a {module(), atom(), [term()]} tuple, or as a list of op() when performing several operations at the same timestamp.

There are several important limitations on the kinds of op() that are safe to use ({F, [Arg]} is the example op() used below):

Some examples of safe to use op() that ship with Erlang:

Some examples of functions you can not use as op():

Optimizations:

There are two functions that modify a statebox that can be used to reduce its size. One or both of these should be done every time before serializing the statebox.

Usage:

Simple ordsets() example:

New = statebox:new(fun () -> [] end),
ChildA = statebox:modify({fun ordsets:add_element/2, [a]}, New),
ChildB = statebox:modify({fun ordsets:add_element/2, [b]}, New),
Resolved = statebox:merge([ChildA, ChildB]),
statebox:value(Resolved) =:= [a, b].

With manual control over timestamps:

New = statebox:new(0, fun () -> [] end),
ChildA = statebox:modify(1, {fun ordsets:add_element/2, [a]}, New),
ChildB = statebox:modify(2, {fun ordsets:add_element/2, [b]}, New),
Resolved = statebox:merge([ChildA, ChildB]),
statebox:value(Resolved) =:= [a, b].

Using the statebox_orddict convenience wrapper:

New = statebox_orddict:from_values([]),
ChildA = statebox:modify([statebox_orddict:f_store(a, 1),
                          statebox_orddict:f_union(c, [a, aa])],
                         New),
ChildB = statebox:modify([statebox_orddict:f_store(b, 1),
                          statebox_orddict:f_union(c, [b, bb])],
                         New),
Resovled = statebox_orddict:from_values([ChildA, ChildB]),
statebox:value(Resolved) =:= [{a, 1}, {b, 1}, {c, [a, aa, b, bb]}].

Resources

On Mochi Labs

statebox, an eventually consistent data model for Erlang (and Riak) on the Mochi Labs blog describes the rationale for statebox and shows how it works.

Convergent / Commutative Replicated Data Types

The technique used to implement this is similar to what is described in this paper: A comprehensive study of Convergent and Commutative Replicated Data Types. statebox was developed without knowledge of the paper, so the terminology and implementation details differ.

I think the technique used by statebox would be best described as a state-based object, although the merge algorithm and event queue is similar to how op-based objects are described.