Home

Awesome

Starlark-re

Starlark-re is an implementation of Python's re module for Starlark. Its interface is almost entirely compatible with the Python module, so please refer to the Python documentation to learn how to use Starlark-re.

Getting started

The re.NewModule() function returns a new Starlark value, that represents the re module:

import (
    "go.starlark.net/starlark"
    re "github.com/magnetde/starlark-re"
)

// Add the re module to the globals dict.
globals := starlark.StringDict{
    "re": re.NewModule(),
}

// Execute a Starlark program using the re module.
opts := &syntax.FileOptions{GlobalReassign:  true}
thread := &starlark.Thread{Name: "re thread"}
globals, err := starlark.ExecFileOptions(opts, thread, "example.star", nil, globals)
if err != nil { ... }

example.star:

p = re.compile('(a(b)c)d')
m = p.match('abcd')
print(m.group(2, 1, 0))  # prints: ("b", "abc", "abcd")

m = re.match(r'(?P<first>\w+) (?P<last>\w+)', 'Jane Doe')
print(m.groupdict())  # prints: {"first": "Jane", "last": "Doe"}

s = re.split(r'[\W]+', 'Words, words, words.', 1)
print(s)  # prints: ["Words", "words, words."]

p = re.compile('(blue|white|red)')
s = p.subn('colour', 'blue socks and red shoes')
print(s)  # prints: ("colour socks and colour shoes", 2)

p = re.compile('section{ ( [^}]* ) }', re.VERBOSE)
s = p.sub(r'subsection{\1}','section{First} section{second}')
print(s)  # prints: subsection{First} subsection{second}

p = re.compile(r'\d+')
s = p.findall('12 drummers drumming, 11 pipers piping, 10 lords a-leaping')
print(s)  # prints: ["12", "11", "10"]

s = [m.span() for m in p.finditer('12 drummers drumming, 11 ... 10 ...')]
print(s)  # prints: [(0, 2), (22, 24), (29, 31)]

plusone = lambda m: str(int(m.group(0)) + 1)
s = p.sub(plusone, '4 + 7 = 13', 2)
print(s)  # prints: 5 + 8 = 13

re.purge()

Alternatively, the module can be initialized with other parameters:

options := &ModuleOptions{
    DisableCache:    false,
    MaxCacheSize:    128,
    DisableFallback: true,
}

m := re.NewModuleOptions(options)

How it works

When compiling a regular expression pattern, it is first parsed using a Go implementation of the Python regex parser. This allows to raise the same error messages as the Python module does. The parser yields a tree representation of the pattern, which is then checked for any elements that are currently not supported by the default regex engine (regexp.Regexp). These unsupported elements include:

If the regular expression pattern does not include any unsupported elements, it is preprocessed and then compiled with the default regex engine. The preprocessor will make necessary modifications to literals, ranges and character classes in the pattern so matching with bytes or using flags such as re.UNICODE, re.IGNORECASE or re.ASCII works exactly like expected.

In case that the regex pattern includes unsupported elements, the regex engine regexp2.Regexp, that supports all of these elements except for possessive repeat, is used instead. However, it should be noted that the using regexp2 may result in higher runtimes, so this engine is only used as a fallback when dealing with regex patterns that contain unsupported elements. Compiled patterns are stored in an LRU cache.

The module was tested against all supported Python tests for the re module (see test_re.py).

Limitations

Currently, there are some differences to the Python re module: