Home

Awesome

lug Build Status License

A C++ embedded domain specific language for expressing parsers as extended parsing expression grammars (PEGs)

lug

Features

It is based on research introduced in the following papers:

Bryan Ford, Parsing expression grammars: a recognition-based syntactic foundation, Proceedings of the 31st ACM SIGPLAN-SIGACT symposium on Principles of Programming Languages, p.111-122, January 2004

Sérgio Medeiros et. al, A parsing machine for PEGs, Proceedings of the 2008 symposium on Dynamic Languages, p.1-12, July 2008

Kota Mizushima et. al, Packrat parsers can handle practical grammars in mostly constant space, Proceedings of the 9th ACM SIGPLAN-SIGSOFT workshop on Program analysis for software tools and engineering, p.29-36, June 2010

Sérgio Medeiros et. al, Left recursion in Parsing Expression Grammars, Science of Computer Programming, v.96 n.P2, p.177-190, December 2014

Leonardo Reis et. al, The formalization and implementation of Adaptable Parsing Expression Grammars, Science of Computer Programming, v.96 n.P2, p.191-210, December 2014

Tetsuro Matsumura, Kimio Kuramitsu, A Declarative Extension of Parsing Expression Grammars for Recognizing Most Programming Languages, Journal of Information Processing, v.24 i.2, p.256-264, November 2015

Sérgio Medeiros et. al, A parsing machine for parsing expression grammars with labeled failures, Proceedings of the 31st Annual ACM symposium on Applied Computing, p.1960-1967, April 2016

Building

As a header-only library, lug itself does not require any build process. To use lug, make sure to include the lug header directory in your project's include path. Once that is done, you are ready to start using lug in your code. To build the sample programs and unit tests both CMake and make are supported.

As a baseline, the following compiler versions are known to work with lug.

CompilerLanguage Mode
Clang 14.0.0 (March 2022)-std=c++17 or -std=gnu++17
Clang 18.1.0 (March 2024)-std=c++17 or -std=gnu++17
GCC 9.5 (May 2022)-std=c++17 or -std=gnu++17
GCC 10.5 (July 2023)-std=c++17 or -std=gnu++17
GCC 11.4 (May 2023)-std=c++17 or -std=gnu++17
GCC 12.4 (June 2024)-std=c++17 or -std=gnu++17
GCC 13.3 (May 2024)-std=c++17 or -std=gnu++17
Microsoft Visual C++ 2017 15.9 (November 2018)Platform Toolset: Visual Studio 2017 Toolset (v141), Language Standard: ISO C++17 Standard (/std:c++17)
Microsoft Visual C++ 2019 16.11 (August 2021)Platform Toolset: Visual Studio 2019 Toolset (v142), Language Standard: ISO C++17 Standard (/std:c++17)
Microsoft Visual C++ 2022 17.10 (May 2024)Platform Toolset: Visual Studio 2022 Toolset (v143), Language Standard: ISO C++17 Standard (/std:c++17)

Syntax Reference

OperatorSyntaxDescription
Ordered Choicee1 | e2Attempts to first match expression e1, and if that fails backtracks then attempts to match e2.
Sequencee1 > e2Matches both expressions e1 followed by e2 in sequence.
Liste1 >> e2Repetition matching of a sequence of one or more e1 expressions delimited by e2. Shorthand for e1 > *(e2 > e1).
Zero-or-More*eRepetition matching of expression e zero, one or more times.
One-or-More+eRepetition matching of expression e one or more times.
Optional~eMatches expression e zero or one times.
Positive Lookahead&eMatches without consuming input if expression e succeeds to match the input.
Negative Lookahead!eMatches without consuming input if expression e fails to match the input.
Cut Before--eIssues a cut instruction before the expression e.
Cut Aftere--Issues a cut instruction after the expression e.
Action Schedulinge < aSchedules a semantic action a to be evaluated if expression e successfully matches the input.
Attribute Bindingv % eAssigns the return value of the last evaluated semantic action within the expression e to the variable v.
Syntactic Capturecapture(v)⁠[e]Captures the text matching the subexpression e into variable v.
ControlDescription
cased⁠[e]Case sensitive matching for the subexpression e (the default)
caseless⁠[e]Case insensitive matching for subexpression e
skip⁠[e]Turns on all whitespace skipping for subexpression e (the default)
noskip⁠[e]Turns off all whitespace skipping for subexpression e, including preceeding whitespace
lexeme⁠[e]Treats subexpression e as a lexical token with no internal whitespace skipping
on(C)⁠[e]Sets the condition C to true for the scope of subexpression e
off(C)⁠[e]Sets the condition C to false for the scope of subexpression e (the default)
symbol(S)⁠[e]Pushes a symbol definition for symbol S with value equal to the captured input matching subexpression e
block⁠[e]Creates a scope block for subexpression e where all new symbols defined in e are local to it and all external symbols defined outside of the block are also available for reference within e
local⁠[e]Creates a local scope block for subexpression e where all new symbols defined in e are local to it and there are no external symbol definitions available for reference
local(S)⁠[e]Creates a local scope block for subexpression e where all new symbols defined in e are local to it and all external symbols defined outside of the block are also available for reference within e, except for the symbol named S
TerminalDescription
nopNo operation, does not emit any instructions
epsMatches the empty string
eoiMatches the end of the input sequence
eolMatches a Unicode line-ending
cutEmits a cut operation into the stream of semantic actions
chr(c)Matches the UTF-8, UTF-16, or UTF-32 character c
chr(c1, c2)Matches characters in the UTF-8, UTF-16, or UTF-32 interval [c1-c2]
str(s)Matches the sequence of characters in the string s
bre(s)POSIX Basic Regular Expression (BRE)
anyMatches any single character
any(flags)Matches a character exhibiting any of the character properties
all(flags)Matches a character with all of the character properties
none(flags)Matches a character with none of the character properties
alphaMatches any alphabetical character
alnumMatches any alphabetical character or numerical digit
blankMatches any space or tab character
cntrlMatches any control character
digitMatches any decimal digit
graphMatches any graphical character
lowerMatches any lowercase alphabetical character
printMatches any printable character
punctMatches any punctuation character
spaceMatches any whitespace character
upperMatches any uppercase alphabetical character
xdigitMatches any hexadecimal digit
when⁠(C)Matches if the condition named C is true, without consuming input
unless⁠(C)Matches if the condition named C is false, without consuming input
exists⁠(S)Matches if there is a definition for symbol S in the current scope
missing⁠(S)Matches if there is no definition for symbol S in the current scope
match⁠(S)Matches the last definition for symbol named S
match_any⁠(S)Matches against any prior definition for symbol named S
match_all⁠(S)Matches against all prior definitions for symbol named S, in sequence from least to most recent
match_front⁠(S,N=0)Matches against the N-th least recent definition for symbol named S
match_back⁠(S,N=0)Matches against the N-th most recent definition for symbol named S
LiteralNameDescription
_cxCharacter ExpressionMatches the UTF-8, UTF-16, or UTF-32 character literal
_sxString ExpressionMatches the sequence of characters in a string literal
_rxRegular ExpressionPOSIX Basic Regular Expression (BRE)
_icxCase Insensitive Character ExpressionSame as _cx but case insensitive
_isxCase Insensitive String ExpressionSame as _sx but case insensitive
_irxCase Insensitive Regular ExpressionSame as _rx but case insensitive
_scxCase Sensitive Character ExpressionSame as _cx but case sensitive
_ssxCase Sensitive String ExpressionSame as _sx but case sensitive
_srxCase Sensitive Regular ExpressionSame as _rx but case sensitive