Awesome
SuperExpressive
This package is the Python port of the following JavaScript library: https://github.com/francisrstokes/super-expressive <br/>
Installation
pip install super_expressive
<br/>
Example
The following example recognises and captures the value of a 16-bit hexadecimal number like 0xC0D3
.
from super_expressive import SuperExpressive
my_regex = (
SuperExpressive()
.start_of_input
.optional.string('0x')
.capture
.exactly(4).any_of
.range('a', 'f')
.range('a', 'f')
.range('0', '9')
.end()
.end()
.end_of_input
.to_regex()
)
// Produces the following regular expression:
re.compile('^(?:0x)?([A-Fa-f0-9]{4})$')
<br/>
API
Legend:
[–] original, not supported
[=] original, supported
[≈] original, supported (slightly different syntax)
[+] new, added
[–] .allow_multiple_matches
API compatibility stub.
Has been intended to use the g
flag on the regular expression, which indicates that it should match multiple values when run on a string.
Python does not have a g
flag, it implements this behavior at the pattern object method level.
Example:
pattern = (
SuperExpressive()
.allow_multiple_matches
.string("hello")
.to_regex_string()
)
# 'hello'
[–] .sticky
API compatibility stub.
Has been intended to use the y flag on the regular expression, which indicates that it should create a stateful regular expression that can be resumed from the last match.
Python does not have a y
flag.
Example:
pattern = (
SuperExpressive()
.sticky
.string("hello")
.to_regex_string()
)
# 'hello'
[+] .ascii
Assumes ascii 'locale'.
Uses the a
flag on the regular expression, which indicates that it should use only ascii characters matching.
You could use this flag when necessary, considering the default mode in Python 3 is the unicode mode.
Example:
pattern = (
SuperExpressive()
.ascii
.string("hello")
.to_regex_string()
)
# '(?a)hello'
[=] .case_insensitive
.caseInsensitive
.ignore_case
.ignoreCase
Ignores case.
Uses the i
flag on the regular expression, which indicates that it should treat ignore the uppercase/lowercase distinction when matching.
Warning: this produces a different regex syntax than the original one (Python, not JS).
Example:
pattern = (
SuperExpressive()
.case_insensitive
.string("hello")
.to_regex_string()
)
# '(?i)hello'
[=] .line_by_line
.lineByLine
.multiline
Makes anchors look for newline.
Uses the m
flag on the regular expression, which indicates that it should treat the .start_of_input
and .end_of_input
markers as the start and end of lines.
Warning: this produces a different regex syntax than the original one (Python, not JS).
Example:
pattern = (
SuperExpressive()
.line_by_line
.string("hello")
.to_regex_string()
)
# '(?m)hello'
[=] .single_line
.singleLine
.dotall
Makes dot match newline.
Uses the s
flag on the regular expression, which indicates
that the input should be treated as a single line, where the .start_of_input
and .end_of_input
markers explicitly mark the start and end of input, and .any_char
also matches newlines.
Warning: this produces a different regex syntax than the original one (Python, not JS).
Example:
pattern = (
SuperExpressive()
.single_line
.string("hello")
.to_regex_string()
)
# '(?s)hello'
[=] .unicode
Assumes unicode 'locale'.
Uses the u
flag on the regular expression, which indicates
that it should use full unicode matching.
Since unicode mode is the default in Python 3, there is no need for using this flag
(but you can use .ascii
instead when necessary).
Warning: this produces a different regex syntax than the original one (Python, not JS).
Example:
pattern = (
SuperExpressive()
.unicode
.string("hello")
.to_regex_string()
)
# '(?u)hello'
[=] .any_char
.anyChar
Matches any single character.
When combined with .single_line
(aka .dotall
), it also matches newlines.
Example:
pattern = (
SuperExpressive()
.any_char
.to_regex_string()
)
# '.'
[=] .whitespace_char
.whitespaceChar
.whitespace
Matches any whitespace character, including the special whitespace characters: \r
, \n
, \t
, \f
, \v
.
Example:
pattern = (
SuperExpressive()
.whitespace_char
.to_regex_string()
)
# '\\s'
[=] .non_whitespace_char
.nonWhitespaceChar
.non_whitespace
.nonWhitespace
Matches any non-whitespace character, excluding also the special whitespace characters: \r
, \n
, \t
, \f
, \v
.
Example:
pattern = (
SuperExpressive()
.non_whitespace_char
.to_regex_string()
)
# '\\S'
[=] .digit
Matches any digit from 0-9
.
Example:
pattern = (
SuperExpressive()
.digit
.to_regex_string()
)
# '\\d'
[=] .non_digit
.nonDigit
Matches any non-digit.
Example:
pattern = (
SuperExpressive()
.non_digit
.to_regex_string()
)
# '\\D'
[=] .word
.word_char
.wordChar
Matches any alpha-numeric (a-z
, A-Z
, 0-9
) characters, as well as _
.
Example:
pattern = (
SuperExpressive()
.word
.to_regex_string()
)
# '\\w'
[=] .non_word
.nonWord
.non_word_char
.nonWordChar
Matches any non alpha-numeric (a-z
, A-Z
, 0-9
) characters, excluding _
as well.
Example:
pattern = (
SuperExpressive()
.non_word
.to_regex_string()
)
# '\\W'
[=] .word_boundary
.wordBoundary
Matches (without consuming any characters) immediately between a character matched by .word
and a character not matched by .word
(in either order).
Example:
pattern = (
SuperExpressive()
.word_boundary
.to_regex_string()
)
# '\\b'
[=] .non_word_boundary
.nonWordBoundary
Matches (without consuming any characters) at the position between two characters matched by .word
.
Example:
pattern = (
SuperExpressive()
.non_word_boundary
.to_regex_string()
)
# '\\B'
[=] .new_line
.newLine
Matches a \n
character.
Example:
pattern = (
SuperExpressive()
.new_line
.to_regex_string()
)
# '\\n'
[=] .carriage_return
.carriageReturn
Matches a \r
character.
Example:
pattern = (
SuperExpressive()
.new_line
.to_regex_string()
)
# '\\r'
[=] .tab
Matches a \t
character.
Example:
pattern = (
SuperExpressive()
.tab
.to_regex_string()
)
# '\\t'
[=] .null_byte
.nullByte
Matches a \\u0000
character (ASCII 0).
Example:
pattern = (
SuperExpressive()
.null_byte
.to_regex_string()
)
# '\\0'
[=] .char(c: str)
Matches the exact (single) character c
.
The c
parameter must be a single character string.
Raises a RegexError
otherwise.
Example:
pattern = (
SuperExpressive()
.char('.')
.to_regex_string()
)
# '\\.'
[=] .string(s: str)
Matches the exact string (the sequential characters) s
.
The s
parameter must be a non-empty string.
Raises a RegexError
otherwise.
Example:
pattern = (
SuperExpressive()
.string("1+1")
.to_regex_string()
)
# '1\\+1'
[=] .range(a: str|int, b: str|int)
Matches any character that falls between a
and b
.
Ordering is defined by a characters ASCII or unicode value.
Both a
and b
parameters must be a single character string or a single digit integer.
The a
character must precede the b
character alphabetically.
Otherwise raises RegexError
.
Example:
pattern = (
SuperExpressive()
.range(0, 9)
.range('a', 'f')
.to_regex_string()
)
# '[0-9][a-f]'
[=] .any_of
.anyOf
Matches a choice between specified elements.
Needs to be finalised with .end()
or .over
.
Example:
pattern = (
SuperExpressive()
.any_of
.char('-')
.range(0, 9)
.string("no")
.end()
.to_regex_string()
)
# '(?:no|[\\-0-9])'
[=] .group
Creates a non-capturing group of the proceeding elements.
Needs to be finalised with .end()
or .over
.
Example:
pattern = (
SuperExpressive()
.optional.group
.char('-')
.range(0, 9)
.string("no")
.end()
.to_regex_string()
)
# '(?:\\-[0-9]no)?'
[=] .assert_ahead
.assertAhead
Assert that the proceeding elements are found without consuming them.
Needs to be finalised with .end()
or .over
.
Example:
pattern = (
SuperExpressive()
.assert_ahead
.range('a', 'f')
.end()
.range('a', 'z')
.to_regex_string()
)
# '(?=[a-f])[a-z]'
[=] .assert_behind
.assertBehind
Assert that the elements contained within are found immediately before this point in the string.
Needs to be finalised with .end()
or .over
.
Example:
pattern = (
SuperExpressive()
.assert_behind
.range('a', 'f')
.end()
.range('a', 'z')
.to_regex_string()
)
# '(?<=[a-f])[a-z]'
[=] .assert_not_ahead
.assertNotAhead
Assert that the proceeding elements are not found without consuming them.
Needs to be finalised with .end()
or .over
.
Example:
pattern = (
SuperExpressive()
.assert_not_ahead
.range('a', 'f')
.end()
.range('a', 'z')
.to_regex_string()
)
# '(?![a-f])[a-z]'
[=] .assert_not_behind
.assertNotBehind
Assert that the elements contained within are not found immediately before this point in the string.
Needs to be finalised with .end()
or .over
.
Example:
pattern = (
SuperExpressive()
.assert_not_behind
.range('a', 'f')
.end()
.range('a', 'z')
.to_regex_string()
)
# '(?<![a-f])[a-z]'
[=] .any_of_chars(chars: str)
.anyOfChars(chars: str)
Matches any of the characters in the provided string chars
.
The chars
parameter must be a non-empty string.
Raises RegexError
otherwise.
Example:
pattern = (
SuperExpressive()
.any_of_chars("aeiou")
.any_of_chars("+-*/=")
.to_regex_string()
)
# '[aeiou][\\+\\-\\*/=]'
[=] .anything_but_chars(chars: str)
.anythingButChars(chars: str)
Matches any character, except any of those in the provided string chars
.
The chars
parameter must be a non-empty string.
Raises RegexError
otherwise.
Example:
pattern = (
SuperExpressive()
.anything_but_chars("aeiou")
.anything_but_chars("+-*/=")
.to_regex_string()
)
# '[^aeiou][^\\+\\-\\*/=]'
[=] .anything_but_range(a: str, b: str)
.anythingButRange(a: str, b: str)
Matches any character, except those that would be captured by the range specified by a
and b
.
Both a
and b
parameters must be a single character string or a single digit integer.
The a
character must precede the b
character alphabetically.
Raises RegexError
otherwise.
Example:
pattern = (
SuperExpressive()
.anything_but_range(0, 9)
.anything_but_range('a', 'f')
.to_regex_string()
)
# '[^0-9][^a-f]'
[=] .anything_but_string(s: str)
.anythingButString(s: str)
Matches any string the same length as s
, except the s
itself (the sequential characters in s
).
The s
parameter must be a non-empty string.
Raises RegexError
otherwise.
Example:
pattern = (
SuperExpressive()
.anything_but_string("aeiou")
.anything_but_string("+-*/=")
.to_regex_string()
)
# '(?:(?!aeiou).{5})(?:(?!\\+\\-\\*/=).{5})'
[=] .capture
Creates a capture group for the proceeding elements.
Needs to be finalised with .end()
or .over
.
Can be later referenced with .backreference(index)
.
Example:
pattern = (
SuperExpressive()
.capture
.string("prefix:")
.range(0, 9)
.char("-")
.range('a', 'f')
.end()
.to_regex_string()
)
# '(prefix:[0-9]\\-[a-f])'
[=] .named_capture(name: str)
.namedCapture(name: str)
Creates a named capture group for the proceeding elements.
Needs to be finalised with .end()
or .over
.
Can be later referenced with .named_backreference(name)
or .backreference(index)
.
The name
parameter must be non-empty string consisting of latin letters, numbers, and underscores only and must not coincide with the name of the capture group defined before.
Raises RegexError
otherwise.
Warning: this produces a different regex syntax than the original one (Python, not JS).
Example:
pattern = (
SuperExpressive()
.named_capture("some_stuff")
.string("prefix:")
.range(0, 9)
.char("-")
.range('a', 'f')
.end()
.to_regex_string()
)
# '(?P<some_stuff>prefix:[0-9]\\-[a-f])'
[=] .backreference(index: int)
.backref(index: int)
Matches exactly what was previously matched by a .capture
or .named_capture
using a positional index.
Note that regex indices start at 1, so the first capture group has index 1.
The index
parameter must be a number between 1 and capture groups count.
Raises RegexError
otherwise.
Warning: this produces a different regex syntax than the original one (Python, not JS).
Example:
pattern = (
SuperExpressive()
.capture
.string("prefix:")
.range(0, 9)
.char("-")
.range('a', 'f')
.end()
.string("something else")
.backreference(1)
.to_regex_string()
)
# '(prefix:[0-9]\\-[a-f])something else\\1'
[=] .named_backreference(name: str)
.namedBackreference(name: str)
.named_backref(name: str)
.namedBackref(name: str)
Matches exactly what was previously matched by a .named_capture
.
The name
parameter must be one of the names of existing capture groups.
Raises RegexError
otherwise.
Warning: this produces a different regex syntax than the original one (Python, not JS).
Example:
pattern = (
SuperExpressive()
.named_capture("some_stuff")
.string("prefix:")
.range(0, 9)
.char("-")
.range('a', 'f')
.end()
.string("something else")
.named_backreference("some_stuff")
.to_regex_string()
)
# '(?P<some_stuff>prefix:[0-9]\\-[a-f])something else(?P=some_stuff)'
[=] .optional
Asserts that the proceeding element may or may not be matched.
Example:
pattern = (
SuperExpressive()
.optional.digit
.to_regex_string()
)
# '\d?'
[=] .zero_or_more
.zeroOrMore
Asserts that the proceeding element may not be matched, or may be matched multiple times.
Example:
pattern = (
SuperExpressive()
.zero_or_more.digit
.to_regex_string()
)
# '\d*'
[=] .zero_or_more_lazy
.zeroOrMoreLazy
Asserts that the proceeding element may not be matched, or may be matched multiple times, but as few times as possible.
Example:
pattern = (
SuperExpressive()
.zero_or_more_lazy.digit
.to_regex_string()
)
# '\d*?'
[=] .one_or_more
.oneOrMore
Asserts that the proceeding element may be matched once, or may be matched multiple times.
Example:
pattern = (
SuperExpressive()
.one_or_more.digit
.to_regex_string()
)
# '\d+'
[=] .one_or_more_lazy
.oneOrMoreLazy
Asserts that the proceeding element may be matched once, or may be matched multiple times, but as few times as possible.
Example:
pattern = (
SuperExpressive()
.one_or_more_lazy.digit
.to_regex_string()
)
# '\d+?'
[=] .exactly(n: int)
Asserts that the proceeding element will be matched exactly n
times.
The n
parameter must be a positive integer.
The application of the method must not conflict with previously applied quantifiers.
Raises RegexError
otherwise.
Example:
pattern = (
SuperExpressive()
.exactly(5).digit
.to_regex_string()
)
# '\d{5}'
[=] .at_least(n: int)
.atLeast(n: int)
Asserts that the proceeding element will be matched at least n
times.
The n
parameter must be a positive integer.
The application of the method must not conflict with previously applied quantifiers.
Raises RegexError
otherwise.
Example:
pattern = (
SuperExpressive()
.at_least(5).digit
.to_regex_string()
)
# '\d{5,}'
[=] .between(x: int, y: int)
Asserts that the proceeding element will be matched somewhere between x
and y
times.
Both x
and y
parameters must be non-negative integers.
The x
parameter must be less than y
parameter.
The application of the method must not conflict with previously applied quantifiers.
Raises RegexError
otherwise.
Example:
pattern = (
SuperExpressive()
.between(3, 5).digit
.to_regex_string()
)
# '\d{3,5}'
[=] .between_lazy(x: int, y: int)
.betweenLazy(x: int, y: int)
Asserts that the proceeding element will be matched somewhere between x
and y
times, but as few times as possible.
Both x
and y
parameters must be non-negative integers.
The x
parameter must be less than y
parameter.
The application of the method must not conflict with previously applied quantifiers.
Raises RegexError
otherwise.
Example:
pattern = (
SuperExpressive()
.between(3, 5).digit
.to_regex_string()
)
# '\d{3,5}?'
[+] .start_of_string
.startOfString
Always asserts the start of input string, regardless of using multiline mode (aka .line_by_line
).
Example:
pattern = (
SuperExpressive()
.start_of_string
.string("hello")
.to_regex_string()
)
# '\Ahello'
[+] .end_of_string
.endOfString
Always asserts the end of input string, regardless of using multiline mode (aka .line_by_line
).
Example:
pattern = (
SuperExpressive()
.string("hello")
.end_of_string
.to_regex_string()
)
# 'hello\Z'
[=] .start_of_input
.startOfInput
Asserts the start of input string, or the start of a line when multiline mode ( aka .line_by_line
) is used.
The application of the method must not conflict with previously applied start-of-input or end-of-input methods.
Raises RegexError
otherwise.
Example:
pattern = (
SuperExpressive()
.start_of_input
.string("hello")
.to_regex_string()
)
# '^hello'
[=] .end_of_input
.endOfInput
Asserts the end of input string, or the end of a line when multiline mode (aka .line_by_line
) is used.
The application of the method must not conflict with previously applied end-of-input method.
Raises RegexError
otherwise.
Example:
pattern = (
SuperExpressive()
.string("hello")
.end_of_input
.to_regex_string()
)
# 'hello$'
[=] .end()
Closes the context of .any_of
, .group
, .capture
, or .assert_*
.
Requires parentheses when invoked (see also .over
).
The method must not be applied out of the context mentioned above.
Raises RegexError
otherwise.
Example:
pattern = (
SuperExpressive()
.string("prefix:")
.capture
.anyOf
.range(0, 9)
.char("-")
.range('a', 'f')
.string("something else")
.end()
.end()
.to_regex_string()
)
# 'prefix:((?:something else|[0-9\\-a-f]))'
[+] .over
Closes the context of .any_of
, .group
, .capture
or .assert_*
.
Alias for .end()
, but doesn't require parentheses.
The method must not be applied out of the context mentioned above.
Raises RegexError
otherwise.
Example:
pattern = (
SuperExpressive()
.string("prefix:")
.capture
.anyOf
.range(0, 9)
.char("-")
.range('a', 'f')
.string("something else")
.over
.over
.to_regex_string()
)
# 'prefix:((?:something else|[0-9\\-a-f]))'
[≈] .subexpression(expr: SuperExpressive, *, namespace: str = "", ignore_flags: bool = True, ignore_start_and_end: bool = True)
.sub(expr, *, namespace="", ignore_flags=True, ignore_start_and_end=True)
Matches another SuperExpressive
instance inline.
Can be used to create libraries, or to modularise you code.
The expr
parameter must be a correctly defined SuperExpressive
object and must not conflict with start-of-input or end-of-input markers defined in the caller object (see also ignore_start_and_end
parameter description below).
Raises RegexError
otherwise.
Example:
hex_number = SuperExpressive().one_or_more.any_of.range(0, 9).range('A', 'F').end()
pattern = (
SuperExpressive()
.subexpression(hex_number)
.one_or_more.whitespace
.optional.subexpression(hex_number)
.to_regex_string()
)
# '[0-9A-F]+\\s+(?:[0-9A-F]+)?'
By default, flags and start/end of input markers are ignored, but can be explicitly turned on in the keyword parameters.
ignore_flags
: If set to true, any flags this subexpression specifies should be disregarded (default isTrue
).
Example:
hex_number = (
SuperExpressive()
.case_insensitive
.one_or_more.any_of
.range(0, 9)
.range('A', 'F')
.end()
)
pattern1 = (
SuperExpressive()
.subexpression(hex_number)
.one_or_more.whitespace
.optional.subexpression(hex_number)
.to_regex_string()
)
# '[0-9A-F]+\\s+(?:[0-9A-F]+)?'
pattern2 = (
SuperExpressive()
.subexpression(hex_number, ignore_flags=False)
.one_or_more.whitespace
.optional.subexpression(hex_number)
.to_regex_string()
)
# '(?i)[0-9A-F]+\\s+(?:[0-9A-F]+)?'
ignore_start_and_end
: If set to true, any.start_of_input
/.end_of_input
asserted in this subexpression specifies should be disregarded (default isTrue
).
Example:
hex_number = (
SuperExpressive()
.start_of_input
.one_or_more.any_of
.range(0, 9)
.range('A', 'F')
.end()
.end_of_input
)
pattern1 = (
SuperExpressive()
.subexpression(hex_number)
.one_or_more.whitespace
.optional.subexpression(hex_number)
.to_regex_string()
)
# '[0-9A-F]+\\s+(?:[0-9A-F]+)?'
pattern2 = (
SuperExpressive()
.subexpression(hex_number)
.one_or_more.whitespace
.optional.subexpression(hex_number, ignore_start_and_end=False)
.to_regex_string()
)
# '[0-9A-F]+\\s+(?:^[0-9A-F]+$)?'
namespace
: A string namespace to use on all named capture groups in the subexpression, to avoid naming collisions with your own named groups (default is""
).
Example:
hex_number = (
SuperExpressive()
.named_capture("hex")
.one_or_more.any_of
.range(0, 9)
.range('A', 'F')
.end()
.end()
.named_backreference("hex")
)
#'(?P<hex>[0-9A-F]+)(?P=hex)'
pattern1 = (
SuperExpressive()
.subexpression(hex_number)
.one_or_more.whitespace
.optional.subexpression(hex_number, namespace="snd_")
.to_regex_string()
)
# '(?P<hex>[0-9A-F]+)(?P=hex)\\s+(?:(?P<snd_hex>[0-9A-F]+)(?P=snd_hex))?'
pattern2 = (
SuperExpressive()
.named_capture("hex")
.subexpression(hex_number, namespace="sub1_")
.one_or_more.whitespace
.optional.subexpression(hex_number, namespace="sub2_")
.end()
.named_backreference("hex")
.to_regex_string()
)
# '(?P<hex>(?P<sub1_hex>[0-9A-F]+)(?P=sub1_hex)\\s+(?:(?P<sub2_hex>[0-9A-F]+)(?P=sub2_hex))?)(?P=hex)'
[=] .to_regex()
.toRegex()
Outputs the regular expression pattern that this SuperExpression
models.
[=] .to_regex_string()
.toRegexString()
.to_string()
.toString()
Outputs a string representation of the regular expression that this SuperExpression
models.