Awesome
Gisla
This is a library for Erlang that implements the Saga pattern for error recovery/cleanup in distributed transactions. The saga pattern describes two call flows, a forward flow that represents progress, and an opposite rollback flow which represents recovery and cleanup activities.
Concept
The sagas pattern is a useful way to recover from long-running distributed transactions.
For example:
-
you want to set up some cloud infrastructure which is dependent on other cloud infrastructure, and if that other cloud infrastructure fails, you want to clean up the resource allocations already performed.
-
you have a microservice architecture and you have resources which need to be cleaned up if a downstream service fails
Using this library, you can create a "transaction" - a matched set of operations organized into "steps" which will be executed one after the other. The new state from the previous operation will be fed into subsequent function calls.
Each operation in a step represents "forward" progress and "rollback". When a "forward" operation fails, gisla will use the accumulated state to execute rollback operations attached to already completed steps in reverse order.
Pipeline | F1 -> F2 -> F3 (boom) -> F4 ...
Example | R1 <- R2 <- R3 <-+
Use
Forward and rollback operations
First, you need to write the forward
and rollback
closures for each
step. Each closure should take at least one parameter, which is
the state of the transaction. State can be any arbitrary Erlang term
but proplists are the recommended format.
These closures may either be functions defined by fun(State) -> ok end
or
fun do_something/1
or they can be tuples in the form of {Module, Function, Arguments = []}
MFA tuples will automatically get the transaction state as
the last parameter of the argument list.
The transaction state will also get the pid of the gisla process so that your operation may optionally return "checkpoint" state changes - these would be incremental state mutations during the course of a step which you may want to use during any unwinding operations that may come later. It's very important that any checkpoint states include all current state too - do not just checkpoint the mutations. Gisla does not merge checkpoint changes - you are responsible for doing that.
When a step is finished, the operation must return its final (possibly mutated) state. This will automatically be reported back to gisla as the "step-complete" state, which would then be passed into the next stage of the transaction.
An example might look something like this:
example_forward(State) ->
% definitely won't fail!
Results0 = {terah_id, Id} = terah:assign_id(),
NewState0 = [ Results0 | State ],
%% The pid of the gisla process is injected automatically
%% to the pipeline state for checkpointing purposes.
gisla:checkpoint(NewState0),
% might fail - TODO: fix real soon
% but we checkpointed out new ID assignment
true = unstable_network:activate_terah_id(Id),
NewState1 = [ {terah_id_active, true} | NewState0 ],
gisla:checkpoint(NewState1),
% final operation, this updates an ETS table,
% probably no failure.
{terah_ets_tbl, TableName} = lists:keyfind(terah_ets_tbl, 1, State),
true = terah:update_ets(TableName, Id),
NewState2 = [ {terah_ets_updated, true} | NewState1 ],
NewState2.
The rollback operation might be something like:
example_rollback(State) ->
%% gisla pid is in our state (if we want it)
{terah_ets_tbl, TableName} = lists:keyfind(terah_ets_tbl, 1, State),
{terah_id, Id} = lists:keyfind(terah_id, 1, State),
true = terah:remove_ets(TableName, Id),
true = unstable_network:deactivate_terah_id(Id),
true = terah:make_id_failed(Id),
[{ terah_id_rollback, Id } | State ].
In this example, we don't send any checkpoints during rollback, just the final state update at the end of the function.
Creating operations, steps and a transaction
Once the closures have been written, you are ready to create steps for your transaction.
There are three abstractions in this library, from most specific to most general:
Operations
Operation records wrap the closures which do the work of each operation. Timeout information is also stored here - the default is 5000 millseconds.
Operation records have the following extra fields to provide additional information on execution results:
-
state: can be either
ready
meaning ready to run, orcomplete
meaning the function was executed. -
result:
success
orfailed
, depending on the outcome of execution -
reason: Contains the exit reason from a process on success or on an error.
They are created using the new_operation/0,1,2
functions. There is also an
update_operation_timeout/2
function by which you may adjust a timeout value.
Steps
Steps are containers that have a name (which may be an atom, a binary string or a string), a single forward operation, and a matched rollback operation.
They are created using new_step/0
or new_stetp/3
functions. As a bit of
syntactic sugar, you may call new_step/3
with either #operation records or
with naked functions or MFA tuples.
Transactions
A transaction is a container for a name (which again may be an atom, a binary string or a string), and a ordered list of steps which will be executed left to right.
Transactions can be made using the new_transaction/0
along with add_step/2
and
delete_step/2
or new_transaction/2
. There is also a describe_transaction/1
function which outputs a simple list of the transaction name plus all step names
in order of execution.
Executing a transaction
Once a transaction is constructed and the steps are organized, you are ready to execute it.
You can do that using execute/2
. The State parameter should be in the form
of a proplist.
When a transaction has been executed, it returns a tuple of {'ok'|'rollback', FinalT, FinalState}
where FinalT is the original transaction with updated
execution information in the operation records and FinalState is the
accumulated state mutations across all steps.
State = [{foo, 1}, {bar, 2}, {baz, 3}],
Step1 = new_step(<<"step 1">>, fun frobulate/1, fun defrobulate/1),
Step2 = new_step(<<"step 2">>, fun activate_frob/1, fun deactivate_frob/1),
Transaction = gisla:new_transaction(<<"example">>, [ Step1, Step2 ]),
{Outcome, FinalT, FinalState} = gisla:execute(Transaction, State),
case Outcome of
ok -> ok;
rollback ->
io:format("Transaction failed. Execution details: ~p, Final state: ~p~n",
[FinalT, FinalState]),
error(transaction_rolled_back)
end.
Errors / timeouts during rollback
If a crash or timeout occurs during rollback, gisla will itself crash.
Build
gisla is built using rebar3. It has a dependency on the hut logging abstraction library in hopes that this would make using it in both Erlang and Elixir easier. By default hut uses the built in Erlang error_logger facility to log messages. Hut also supports a number of other logging options including Elixir's built in logging library and lager.
About the name
It was inspired by the Icelandic saga Gisla.