Home

Awesome

Most of this document is excepted from Ragel 6.8 document, copyright belongs to the original author Adrian Thurston.

Ragel Cheat Sheet

Ragel compiles executable finite state machines from a high level regular language notation to C, C++, Objective-C, D, Go, Java and Ruby code.

Construction State Machines

A multi-line FSM spec starts with %%{ and ends with }%%. A single-line FSM spec starts with %% and ends at the first newline.

#include <string.h>
#include <stdio.h>
%%{
	machine foo;
	main :=
		( ’foo’ | ’bar’ )
		0 @{ res = 1; };
}%%

%% write data;

int main( int argc, char **argv )
{
	int cs, res = 0;
	if ( argc > 1 ) {
		char *p = argv[1];
		char *pe = p + strlen(p) + 1;
		%% write init;
		%% write exec;
	}
	printf("result = %i\n", res );
	return 0;
}

Machine Definition

<name> = <expression>;

Machine Instantiation

<name> := <expression>;

Ragel Block Lexical Analysis

Basic Machines

builtin_machine

Regular Language Operators

Union

expr | expr

The union operation produces a machine that matches any string in machine one or machine two

Intersection

expr & expr

Intersection produces a machine that matches any string that is in both machine one and machine two

Difference

expr - expr

The difference operation produces a machine that matches strings that are in machine one but are not in machine two

For example: (any - space)*

Strong Difference

expr -- expr

Strong difference produces a machine that matches any string of the first machine that does not have any string of the second machine as a substring

Concatenation

expr . expr

Concatenation produces a machine that matches all the strings in machine one followed by all the strings in machine two

Kleene Star

expr*

The machine resulting from the Kleene Star operator will match zero or more repetitions of the machine it is applied to

One Or More Repetition

expr+

Optional expr?
Repetition
Negation

!expr

Negation produces a machine that matches any string not matched by the given machine. Negation is equivalent to (any* - expr).

Character-Level Negation

^expr

Character-Level Negation is equivalent to (any - expr)

User Actions

Two ways of using action

  1.  main := ( lower* >{ printf("action lower"); }) . ' ';
    
  2.  action A { printf("action lower"); }
     main := ( lower* >A) . ' ';
     
    

Transition actions

State actions

The different classes of states are:
The different kinds of embeddings are:
To-State Actions
From-State Actions
EOF Actions
Global Error Actions
Local Error Actions

Values and Statements Available in Code Blocks

Value nameExplanation
fpcA pointer to the current character. This is equivalent to accessing the p variable
fcThe current character. This is equivalent to the expression (*p).
fcursAn integer value representing the current state. This value should only be read from. To move to a different place in the machine from action code use the fgoto, fnext or fcall statements. Outside of the machine execution code the cs variable may be modified.
ftargsAn integer value representing the target state. This value should only be read from. Again, fgoto, fnext and fcall can be used to move to a specific entry point.
fentry(<label>)Retrieve an integer value representing the entry point label. E.g. fgoto *((ctx->done) ? fentry(html_default) : fentry(html_special));
Statement nameExplanation
fhold;Do not advance over the current character
fexec <expr>;Set the next character to process. This can be used to backtrack to previous input or advance ahead. Unlike fhold, which can be used anywhere, fexec requires the user to ensure that the target of the backtrack is in the current buffer block or is known to be somewhere ahead of it. The machine will continue iterating forward until pe is arrived at, fbreak is called or the machine moves into the error state. In actions embedded into transitions, the fexec statement is equivalent to setting p to one position ahead of the next character to process. If the user also modifies pe, it is possible to change the buffer block entirely.
fgoto <label>;Jump to an entry point defined by <label>.
fgoto *<expr>;Jump to an entry point given by <expr>. Use together with fentry.
fnext <label>;Set the next state to be the entry point defined by label.
fnext *<expr>;Set the next state to be the entry point given by <expr>. Use together with fentry.
fcall <label>;Push the target state and jump to the entry point defined by <label>. The next fret will jump to the target of the transition on which the call was made. Use of fcall requires the declaration of a call stack. An array of integers named stack and a single integer named top must be declared
fcall *<expr>;Push the current state and jump to the entry point given by <expr>. Use together with fentry.
fret;Return to the target state of the transition on which the last fcall was made.
fbreak;fbreak; – Advance p, save the target state to cs and immediately break out of the execute loop. After an fbreak statement the p variable will point to the next character in the input. The current state will be the target of the current transition. Note that fbreak causes the target state’s to-state actions to be skipped.

Controlling Nondeterminism

Guarded Operators

Entry-Guarded Concatenation

expr :> expr

Finish-Guarded Concatenation

expr :>> expr

Left-Guarded Concatenation

expr <: expr

Longest-Match Kleene Star

expr**

This version of kleene star puts a higher priority on staying in the machine versus wrapping around and starting over.

Operator Precedence

PrecedenceOperatorComments
1,Join
2| & - --Union, Intersection and Subtraction
3. <: :> :>>Concatenation
4:Label
5->Epsilon Transition
6> @ $ %Transitions Actions and Priorities
">/ $/ %/ </ @/ <>/EOF Actions
">! $! %! <! @! <>!Global Error Actions
">^ $^ %^ <^ @^ <>^Local Error Actions
">~ $~ %~ <~ @~ <>~To-State Actions
">* $* %* <* @* <>*From-State Action
7* ** ? + {n} {,n} {n,} {n,m}Repetition
8! ^Negation and Character-Level Negation
9( <expr> )Grouping

Interface to Host Program

Write

%%{
machine foo;
write data;
}%%

void init(...) {
    ...
    %% write init;
    ...
}

int parse(...) {
    ...
    %% write exec;
    ...
}
Access Statement

access fsm->;

The access statement specifies how the generated code should access the machine data that is persistent across processing buffer blocks. This applies to all variables except p, pe and eof. This includes cs, top, stack, ts, te and act. The access statement is useful if a machine is to be encapsulated inside a structure in C code.

Alphtype Statement

alphtype unsigned char;

The alphtype statement specifies the alphabet data type that the machine operates on. During the compilation of the machine, integer literals are expected to be in the range of possible values of the alphtype. The default is char for all languages except Go where the default is byte.

Code generation

Code Output Style OptionsMeaningLanguages
-T0binary search table-drivenC/D/Java/Ruby/C#/Go
-T1binary search, expanded actionsC/D/Ruby/C#/Go
-F0flat table-drivenC/D/Ruby/C#/Go
-F1flat table, expanded actionsC/D/Ruby/C#/Go
-G0goto-drivenC/D/C#/Go
-G1goto, expanded actionsC/D/C#/Go
-G2goto, in-place actionsC/D/Go

Debugging

The most effective way to debug Ragel grammar is to generate a transition graph from Ragel file using Ragel and Graphviz.

ragel -Vp a.rl -o a.dot
dot a.dot -Tpng -o a.png

-V: to generate a dot file

-p: use character labels instead of number labels

Get better graphs

If the final graph is too large to be meaningful, the use is able to inspect portions of the parser by naming particular regular expression definition with -S and -M options.

Using machine name:

ragel -Vp -S [machine_name] a.rl -o a.dot

Or using rule name:

ragel -Vp -M [rule_name] a.rl -o a.dot

Misc