Awesome
Native V-lang implementation of Rosie-RPL
Rosie is a pattern language (RPL for short). A little bit like regex, but aiming to solve several of the regex issues. All credits to Jamie A. Jennings and her friends for this job.
This project (native V-lang implementation of RPL) is work in progress (beta), but ready to be tested in the field. APIs may still change, CLI is available, and a REPL is on the todo list. The current version is fully functional: it parses and compiles all files in Jamie's RPL libary (./rpl), including the rpl files to parse RPL code, and it successfully executes all (inline) unittests in this folder.
Very similar to a compiler, the project consists of the following modules:
- A core 0 parser, written in V, which is able to parse rpl input into an AST
- A RPL-parser which uses the core-0 parser to create the byte-code for the RPL-parser
- An Expander (and optimizer) that expands macros and aliases
- A compiler backend, which converts the AST into virtual machine byte code
- A virtual machine runtime, able to execute the byte code instructions and match input against the pattern
- A CLI module (command line interface)
- A unittest module, which support RPL inline tests to valid patterns in '*.rpl' files
- A disassembler that prints the byte code instructions generated for a specific pattern
- A tracer utility (via CLI) that greatly helps with debugging input against pattern
Not yet available:
- Possibly an additional compiler backend that generates native V code
- A shared library and language integration, e.g. Python
Project objectives
- Be compliant with the RPL Language Reference
- Easy and intuitive to use in V-lang projects
- A REPL to test and debug rpl pattern easily
- Jamie's implementation has nice support for grep-like search, colored output, and also tree-like output to review details of the AST. I'd like to reach at least a similar level of user support.
- Integration with other lanugages such as Python, Julia, C/C++, Rust, Java, JavaScript, etc. The more popular languages are supported, the better. (I wish V-lang would have a python integration module)
- A Visual-Code Studio plugin would be nice. Syntax highligting for rpl files, readonly view of disassembled rplx files, compile rpl files upon save or manually triggered, automatically run unittest, etc..
A bit of history
The project started with a tiny virtual machine (v1), able to load and execute '*.rplx' files (compiled RPL code), generated by Rosie's original compiler. It is working, but is not battle tested. By now, the virtual machine has evolved (v2) and is no longer backwards compatible. We are still able to read and execute '*.rplx' files, but we'll not put more effort into it.
Please note that neither the '*.rplx' file structure nor the byte codes of the virtual machine are part of Rosie's specification and thus are subject to change without formal notice from the Rosie team.
Originally the project was a proof-of-concept aiming at getting pratical experience with V and validate it's promises. I decided to use Rosie because I like many of it's ideas, and thought it would be a good contributions to V as well.
Obviously I had to start somewhere, and I decided to start with the RPL runtime. The original RPL runtime is written in C, whereas the compiler and frontend is a mixture of C and Lua. The V implementation started as copy of the C-code, gradually introducing more and more V constructs, and also replacing 'unsafe' pointer arithmetics. V's C-to-V translator was not yet available, hence I translated and reengineered the code manually.
Next I've added an RPL parser written in V, able to read and parse RPL source code into an AST (intermediate representation). It successfully reads all '*.rpl' files provided in Rosie's library, including the rpl files implementing the RPL language specification itself.
And then a compiler that generates RPL-VM byte code instructions (v2). Now I had all core components required available and fully implemented in V-lang.
Performance tests and optimisations, a proper CLI, and the ability to easily plug-in additional parsers, optimizers and compilers, were now top of my prio list.
First I've added a benchmark module for the runtime (matching input against a pattern), which let to a greatly improved runtime performance. It was a good learning excercise for me on how certain V features affect the performance, but also which ones are left to the C-Compiler and how the CPU architecture affects the results. But I'm certainly not an X86 or SIMD assembler experts, neither a CPU profiling expert.
Slowly the project is moving into a more stable mode, evident by the enhancements that followed:
- A CLI (with colored output)
- A tracer (debugger) to more easily analyse what is happening when matching input against a pattern
- An Engine, that allows to more easily plug-in different versions of parsers, optimizer and compilers.
- Tons of entries in todo.md and TODO comments in the source code
As mentioned, this project started as PoC to practically test and gain some experience with V-lang. Despite some rough edges here and there, so far I'm mostly pleased. See here for my very own FAQ and "things to remember" list. I find the V-code much easier to read and maintain then comparable C-code. Compiler speed is definitely a plus as well, allowing for quick code-test cycles. Occassionaly I wish a V-interpreter or -debugger would already be available, to help me find and fix issues. For now, adding and removing debug messages is what I do (and why V built time is so important).
CMD, PS, bash etc.. and the problem with quotes
This project can be embedded in other V projects, but it also comes with a cli. The cli has subcommands
such as 'grep' and 'match', expecting a pattern argument such as "a" ~ "b"
. The pattern has
double quotes and spaces. Both are treated differently, depending on your shell (bash, CMD, PS, ...).
Because I stumbled upon it more then ones, I've collected links to blogs that helped me understand
here.
Differences with Jamie's implementation
I've tried to limit differences as much as possible, but ocassionally and very conciously, I've decided to differ.
- built-in overrides: I've added a 'builtin' binding attribute in addition to 'alias' and 'local', so that e.g.
builtin alias ~ = [:space:]+
will override the builtin implementation, and it'll be applied to all patterns, including the imported packages, and their imports. - I did not implement '#'. It's not used in any of Jamie's rpl files, which is a good indication, that it is not needed.
- The cli commands and outputs are slightly different
- Added support to print the byte-code (diassembler)
- The tracing output looks completely different, IMHO more concise and better readable.
- The supported rcfile variables are different, also 'add_xxx' allows to add a libpath or color.
- Because performance analysis revealed that captures occassionaly take significant time (%) off the end-to-end processing time. The user function to execute a match allows to provide a list of bindings, which are really needed, superseding what is defined in the rpl files.
- As alluded to above, my byte codes have evolved quite a bit, significantly contributing to the runtime performance.
- In RPL 1.x the &-operator is equivalent to {>p q}. Which IMHO is misleading, everybody beliefs its concatenation, and I've not seen it being used anywhere in the lib files. Hence, we do not support it. I think Jamie plans to remove it in RPL 2.x as well.
- Several improvements on Jamie's todo lists, have been implemented in my project, such as
- multiple byte-code entrypoints (per file)
- re-usable code-blocks (aka functions; but w/o parameters) reducing the file size many times
- Some commonly used (and complex) builtin pattern, e.g. "." and "~" have their own byte-code instruction with performance optimized V-code implementations.
- Based on the experience gained throughout the project, I made a couple of suggestions to Jamie on how to evolve the Rosie Pattern language in version. I've started with my own one in "./rpl/rosie/rpl_3_0_jdo.rpl". So far, it is only thoughts and not yet implemented anywhere.