Awesome
Berp
A flexible cross-language parser generator with support for languages without explicit tokenization rules (like Gherkin).
Installation
It can be installed from NuGet. The executable is within the tools/net471
folder inside the package.
Features
- generates parser for it's own grammar (the "hello world" for parser generators), see Berp Grammar
- does not generate a lexer/tokenizer, so ideal for languages where tokenization is easy or anyway not really possible
- simple, BNF-like grammar definition
- supports multiple target languages (currently C#, Java, Ruby, JavaScript, Go, Python) with the same grammar (the language generation is specified in template files)
- allows building AST, with AST-building hooks
- supports streamed token reading (tokens can be kept attached to the input stream to avoid unnecessary data transfer and object creation)
- supports context-sensitive tokens, also possible to change the tokenization rules during parsing (e.g. when a
#language: no
is encountered) - supports a special "other" token, that matches to the "anything-else" case, when there is no better match
- support for recursive grammar rules is limited (it parses them up to a certain level only)
- simple, look-ahead rules can be specified
- rules can be marked as production rules to be represented in AST
- allows capturing ignored content tokens (e.g. comments)
Samples
Supported target languages
- C# -
CSharp.razor
- Java -
Java.razor
- Ruby -
Ruby.razor
- JavaScript (TypeScript) -
TypeScript.razor
- Go -
Go.razor
- Python -
Python.razor
TODO
- Go: separate line from token
- Go: support for SimpleTokenMatcher
- Go: support for MaxCollectedError
- Import Perl from Gherkin
- Import Objective-C from Gherkin header & implementation
- Import C from Gherkin header & implementation