Awesome
<div align='center'> <h1>ParseBox</h1> <p>Parser Combinators in the TypeScript Type System</p> <img src="https://raw.githubusercontent.com/sinclairzx81/parsebox/refs/heads/main/parsebox.png" /> <br /> <br /> </div>Install
$ npm install @sinclair/parsebox
Example
ParseBox provides symmetric parsing in Static and Runtime environments.
import { Runtime, Static } from '@sinclair/parsebox'
// Runtime Parser
const T = Runtime.Tuple([
Runtime.Const('X'),
Runtime.Const('Y'),
Runtime.Const('Z')
])
const R = Runtime.Parse(T, 'X Y Z W') // const R = [['X', 'Y', 'Z'], ' W']
// Static Parser
type T = Static.Tuple<[
Static.Const<'X'>,
Static.Const<'Y'>,
Static.Const<'Z'>
]>
type R = Static.Parse<T, 'X Y Z W'> // type R = [['X', 'Y', 'Z'], ' W']
Which can be used to construct Expression or even Type Syntax parsers at runtime and inside the type system.
Overview
ParseBox is a TypeScript parsing library designed to embed domain-specific languages (DSLs) within the TypeScript type system. It provides a core set of declarative combinators for parsing in Static and Runtime environments. The parsing structures created with this library can be replicated across environments to ensure symmetric parsing logic in each environment (Static or Runtime). Declarative parsing structures help to mitigate many problems that arise when replicating explicit logic in different languages, with the declarative combinators leading to more managable code overall.
This project is written as a parsing infrastructure for the TypeBox and LinqBox projects. It is offered as a standalone package for experimentation, as well as to provide a foundation for advancing string DSL parsing in the TypeScript type system.
License MIT
Contents
Parsers
ParseBox provides three main combinators Const
, Tuple
, and Union
which are named after the types of values they parse. These combinators are used to express BNF (Backus-Naur Form) grammar within the type system. These combinators service as fundamental building blocks for constructing higher order parsers.
Const
The Const combinator parses for the next occurrence of the specified string. Whitespace and newline characters are ignored during parsing, unless the specified string explicitly matches those characters.
BNF
<T> ::= "X"
TypeScript
const T = Runtime.Const('X') // const T = {
// type: 'Const',
// value: 'X'
// }
const R = Runtime.Parse(T, 'X Y Z') // const R = ['X', ' Y Z']
Tuple
The Tuple parser matches a sequence of interior parsers. An empty tuple can be used to represent Epsilon (the empty production).
BNF
<T> ::= "X" "Y" "Z"
TypeScript
const T = Runtime.Tuple([ // const T = {
Runtime.Const('X'), // type: 'Tuple',
Runtime.Const('Y'), // parsers: [
Runtime.Const('Z'), // { type: 'Const', value: 'X' },
]) // { type: 'Const', value: 'Y' },
// { type: 'Const', value: 'Z' },
// ]
// }
const R = Runtime.Parse(P, 'X Y Z W') // const R = [['X', 'Y', 'Z'], ' W']
Union
The Union combinator parses using one of the interior parsers, attempting each in sequence until one matches.
BNF
<T> ::= "X" | "Y" | "Z"
TypeScript
const T = Runtime.Union([ // const T = {
Runtime.Const('X'), // type: 'Union',
Runtime.Const('Y'), // parsers: [
Runtime.Const('Z') // { type: 'Const', value: 'X' },
]) // { type: 'Const', value: 'Y' },
// { type: 'Const', value: 'Z' }
// ]
// }
const R1 = Runtime.Parse(P, 'X E') // const R1 = ['X', ' E']
const R2 = Runtime.Parse(P, 'Y E') // const R2 = ['Y', ' E']
const R3 = Runtime.Parse(P, 'Z E') // const R3 = ['Z', ' E']
Terminals
ParseBox provides combinators that can be used to parse common terminals.
Number
The Number combinator will parse a numeric literal.
const T = Runtime.Number()
// ...
const R1 = Runtime.Parse(T, '1') // const R1 = ['1', '']
const R2 = Runtime.Parse(T, '3.14') // const R2 = ['3.14', '']
const R3 = Runtime.Parse(T, '.1') // const R3 = ['.1', '']
const E = Runtime.Parse(T, '01') // const E = []
String
The String combinator will parse for quoted string literals. This combinator accepts an array of permissable quote characters. The result of this parser is the interior wrapped string.
const T = Runtime.String(['"'])
// ...
const R = Runtime.Parse(T, '"hello"') // const R = ['hello', '']
Ident
The Ident combinator will parse for a valid JavaScript identifiers. The following parses a let statement.
<T> ::= "let" <ident> "=" <number>
const Let = Runtime.Tuple([ // const Let = {
Runtime.Const('let'), // type: 'Tuple',
Runtime.Ident(), // parsers: [
Runtime.Const('='), // { type: 'Const', value: 'let' },
Runtime.Number() // { type: 'Ident' },
]) // { type: 'Const', value: '=' },
// { type: 'Number' },
// ]
// }
const R = Runtime.Parse(Let, 'let n = 10') // const R = [[ 'let', 'n', '=', '10'], '' ]
Mapping
ParseBox supports semantic actions (i.e., mapping) for both Static and Runtime parsing. These actions allow parsed content to be transformed into complex structures, such as abstract syntax trees (ASTs). Below is an explanation of how mapping works in both Runtime and Static environments.
Runtime
In Runtime parsing, combinators can accept an optional callback function as their last argument. This callback receives the parsed elements, which can then be mapped to arbitrary return values. The following example demonstrates how a let statement is parsed and how a mapping function is used to transform the result into a syntax node.
const LetMapping = (_0: 'let', _1: string, _2: '=', _3: string) => {
return {
type: 'Let',
ident: _1,
value: parseFloat(_3)
}
}
const Let = Runtime.Tuple([
Runtime.Const('let'), // _0
Runtime.Ident(), // _1
Runtime.Const('='), // _2
Runtime.Number() // _3
], values => LetMapping(...values))
const R = Runtime.Parse(Let, 'let value = 10') // const R = [{
// type: 'Let',
// ident: 'value',
// value: 10
// }, '' ]
Static
In Static combinators, an optional type of IMapping is provided as the last generic argument. Unlike Runtime callbacks, which receive parsed values directly as parameters, Static actions use the this['input']
property to access input values, and they store the mapped results in the output
property. The following example demonstrates how to implement the Let parser using Static actions.
type ParseFloat<Value extends string> = (
Value extends `${infer Value extends number}` ? Value : never
)
interface LetMapping extends Static.IMapping {
output: this['input'] extends ['let', infer Ident, '=', infer Value extends string] ? {
type: 'Let',
ident: Ident
value: ParseFloat<Value>
} : never
}
type Let = Static.Tuple<[
Static.Const<'let'>,
Static.Ident,
Static.Const<'='>,
Static.Number
], LetMapping>
type R = Static.Parse<Let, 'let value = 10'> // type R = [{
// type: 'Let',
// ident: 'value',
// value: 10
// }, '' ]
Context
ParseBox provides a context mechanism that allows parsed content to interact with the host environment. A context value can be passed as the last argument to the Parse function, and ParseBox will propagate the context to each mapping function, enabling more dynamic parsing behavior.
Runtime
In Runtime parsing, the context is passed as the second argument to the mapping functions. This allows the parser to access external data or state during the parsing process.
import { Runtime } from '@sinclair/parsebox'
// use matched input as indexer on context
const OptionMapping = (input: 'A' | 'B' | 'C', context: Record<string, string>) => {
return input in context
? context[input]
: void 0
}
const Option = Runtime.Union([
Runtime.Const('A'),
Runtime.Const('B'),
Runtime.Const('C')
], OptionMapping)
const R = Runtime.Parse(Option, 'A', { // const R = ['Matched Foo', '']
A: 'Matched Foo',
B: 'Matched Bar',
C: 'Matched Baz',
})
Static
In Static combinators, the context is accessible via the this['context']
property within the mapping action type.
import { Static } from '@sinclair/parsebox'
// use input as indexer on context
interface OptionMapping extends Static.IMapping {
output: this['input'] extends keyof this['context']
? this['context'][this['input']]
: never
}
type Option = Static.Union<[
Static.Const<'A'>,
Static.Const<'B'>,
Static.Const<'C'>
], OptionMapping>
type R = Static.Parse<Option, 'A', { // type R = ['Matched Foo', '']
A: 'Matched Foo',
B: 'Matched Bar',
C: 'Matched Baz',
}>
Modules
ParseBox modules serve as containers for parsers, enabling recursive and mutually recursive parsing. When designing parsers, one common challenge is the order in which parsers are defined—particularly when parsers need to reference each other but haven’t been defined yet. This creates topological ordering issues. Modules resolve this problem by allowing parsers to reference each other within a contained scope, enabling mutual recursion or recursion within a single parser, even when the parsers are defined in an arbitrary order.
Modules are only applicable for Runtime parsers only as Static parsers do not encounter topological ordering issues, this is because TypeScript types are non order dependent.
Recursive List Parsing
In the following example, we define a List parser that can recursively parse sequences of Value elements. The List parser is defined as either a tuple of a Value followed by another List (recursive) or an empty tuple (base case, representing an empty list). The Value parser is a simple union of strings, and the recursive nature of List is achieved by referencing the Value and List parsers within the same module.
import { Runtime } from '@sinclair/parsebox'
const Module = new Runtime.Module({
Value: Runtime.Union([
Runtime.Const('X'),
Runtime.Const('Y'),
Runtime.Const('Z'),
]),
// List: Will match Value then try to match another via a recursive ref to
// List. This will repeat until no Value can be matched. When no Value can
// be matched, the fall through [] (or Epsilon) will match.
List: Runtime.Union([
Runtime.Tuple([Runtime.Ref('Value'), Runtime.Ref('List')]),
Runtime.Tuple([])
], values => values.flat())
})
const R = Module.Parse('List', 'X Y Z Y X E') // const R = [['X', 'Y', 'Z', 'Y', 'X'], ' E']
Advanced
ParseBox is an LL(1) parser. When building parsers for complex grammars, care must be taken to avoid infinite left recursion, which can occur if a recursive grammar refers back to itself in a way that causes the parser to enter an infinite loop. This is particularly common in expression parsers.
Expression Parsing
The following shows a reference expression parser using LL(1) techniques to avoid left recursion. The following parser respects operator precedence and grouping.
import { Static } from '@sinclair/parsebox'
// BinaryMapping: Reduces matched Term and Expr into binary expression node
type BinaryReduce<Left extends unknown, Right extends unknown[]> = (
Right extends [infer Operator, infer Right, infer Rest extends unknown[]]
? { left: Left, operator: Operator, right: BinaryReduce<Right, Rest> }
: Left
)
interface BinaryMapping extends Static.IMapping {
output: this['input'] extends [infer Left, infer Right extends unknown[]]
? BinaryReduce<Left, Right>
: never
}
// FactorMapping: Map either grouped Expr or Operand
interface FactorMapping extends Static.IMapping {
output: (
this['input'] extends ['(', infer Expr, ')'] ? Expr :
this['input'] extends [infer Operand] ? Operand :
never
)
}
// ...
type Operand = Static.Ident
type Factor = Static.Union<[
Static.Tuple<[Static.Const<'('>, Expr, Static.Const<')'>]>,
Static.Tuple<[Operand]>,
], FactorMapping>
type TermTail = Static.Union<[
Static.Tuple<[Static.Const<'*'>, Factor, TermTail]>,
Static.Tuple<[Static.Const<'/'>, Factor, TermTail]>,
Static.Tuple<[]>,
]>
type ExprTail = Static.Union<[
Static.Tuple<[Static.Const<'+'>, Term, ExprTail]>,
Static.Tuple<[Static.Const<'-'>, Term, ExprTail]>,
Static.Tuple<[]>,
]>
type Term = Static.Tuple<[Factor, TermTail], BinaryMapping>
type Expr = Static.Tuple<[Term, ExprTail], BinaryMapping>
// ...
type R = Static.Parse<Expr, 'x * (y + z)'> // type R = [{
// left: "x";
// operator: "*";
// right: {
// left: "y";
// operator: "+";
// right: "z";
// };
// }, ""]
Contribute
ParseBox is open to community contribution. Please ensure you submit an open issue before submitting your pull request. The ParseBox project prefers open community discussion before accepting new features.