Home

Awesome

Regal

Royally reified regular expressions

<!-- badges -->

CircleCI cljdoc badge Clojars Project bb compatible

<!-- /badges -->

tl;dr

Regal lets you manipulate regular expressions as data, by providing a Hiccup-like regex syntax, and ways to convert between this Hiccup syntax (Regal syntax), compiled regex patterns, and test.check generators. It also helps with writing cross-platform code by providing consistent semantics across JS/Java runtimes, and it allows converting JavaScript regex to Java regex semantically (useful for e.g. dealing with JSON Schema in Clojure)

The slightly longer version

Regal provides a syntax for writing regular expressions using plain Clojure data: vectors, keywords, strings. This is known as Regal notation.

Once you have a Regal form you can either compile it to a regex object (java.util.regex.Pattern or JavaScript RegExp), or you can use it to create a Generator (see test.check) for generating values that conform to the given pattern.

It is also possible to parse regular expression patterns back to Regal forms.

Regal is Clojure and ClojureScript compatible, and has fixed semantics across platforms. Write your forms once and run them anywhere! It also allows manipulating multiple regex flavors regardless of the current platform, so you can do things like converting a JavaScript regex pattern to one that is suitable for Java's regex engine.

<!-- opencollective -->

Support Lambda Island Open Source

Regal is part of a growing collection of quality Clojure libraries and tools released on the Lambda Island label. If you find value in our work please consider becoming a backer on Open Collective

<!-- /opencollective -->

Project status

Regal is alpha level software, this does not mean it is of low quality or not fit for use, it does mean that future breakage of the API is still possible.

The following aspects of the library are generally well tested and developed, and we intend to retain compatibility as much as practically possible.

The following aspects have known issues or are otherwise untested or incomplete, and you can expect them to change significantly as we further develop them:

Installation

deps.edn

lambdaisland/regal {:mvn/version "0.1.175"}

project.clj

[lambdaisland/regal "0.1.175"]

An example

(require '[lambdaisland.regal :as regal]
         '[lambdaisland.regal.generator :as regal-gen])

;; Regal expression, like Hiccup but for Regex
(def r [:cat
        [:+ [:class [\a \z]]]
        "="
        [:+ [:not \=]]])

;; Convert to host-specific regex
(regal/regex r)
;;=> #"[a-z]+\Q=\E[^=]+"

;; Match strings
(re-matches (regal/regex r) "foo=bar")
;;=> "foo=bar"

;; ... And generate them
(regal-gen/gen r)
;;=> #clojure.test.check.generators.Generator{...}

(regal-gen/sample r)
;;=> ("t=‘" "d=5Ë" "zja=·" "uatt=ß¾" "lqyk=É" "xkj=q\f’" "gxupw=æ" "pkadbgmc=¯²" "f=ËJ" "d=ç")

A swiss army knife

Regal can convert between three different represenations for regular expressions, Regal forms, patterns(i.e. strings), and regex objects. Here is an overview of how to get from one to the other.

↓From / To→FormPatternRegex
Formidentitylambdaisland.regal/patternlambdaisland.regal/regex
Patternlambdaisland.regal.parse/parse-patternidentitylambdaisland.regal/compile
Regexlambdaisland.regal.parse/parselambdaisland.regal/regex-patternidentity

Regal forms

Forms consist of vectors, keywords, strings, character literals, and in some cases integers. For example:

[:cat [:alt [:char 11] [:char 13]] \J [:rep "hello" 2 3]]

Forms have platform-independent semantics. The same regal form will match the same strings both in Clojure and ClojureScript, even though Java and JavaScript (and even different versions of Java or JavaScript) have different regex "flavors". In other words, we generate the regex that is right for the target platform.

;; Clojure
(regal/regex :vertical-whitespace) ;;=> #"\v"

;; ClojureScript
(regal/regex :vertical-whitespace) ;;=> #"[\n\x0B\f\r\x85\u2028\u2029]"

Regal currently knows about three "flavors"

By default it takes the flavor that is best suited for the platform, but you can override that with lambdaisland.regal/with-flavor

(regal/with-flavor :ecma
  (regal/pattern ...))

Note that using regal/regex with a flavor that does not correspond with the flavor of the platform may yield unexpected results, when dealing with "foreign" regex flavors always stick to string representations (i.e. patterns).

Pattern

The second regex representation regal knows about is the pattern, i.e. the regex pattern in string form.

(regal/regex-pattern #"\u000B\v") ;; => "\\u000B\\v"

Depending on the situation there are several reasons why you might want to use this pattern representation over the compiled regex object.

Note that in Clojure the syntax available in regex patterns differs from the syntax available in strings, in particluar when it comes to notations starting with a backslash. e.g. #"\xFF" is a valid regex, while "\xFF" is not a valid string. We encode regex patterns in strings, which practically speaking means that backslashes are escaped (doubled).

(regal/regex-pattern #"\xFF") ;;=> "\\xFF"
(regal/compile "\\xFF")       ;;=> #"\xFF"

Regex

To use the regex engine provided by the runtime (e.g. through re-find or re-seq) you need a platform-specific regex object. This is what lambdaisland.regal/regex gives you.

Grammar

You can add your own extensions (custom tokens) by providing a :registry option mapping namespaced keywords to Regal expressions.

Use with spec.alpha

(require '[lambdaisland.regal.spec-alpha :as regal-spec]
         '[clojure.spec.alpha :as s]
         '[clojure.spec.gen.alpha :as gen])

(s/def ::x-then-y (regal-spec/spec [:cat [:+ "x"] "-" [:+ "y"]]))

(s/def ::xy-with-stars (regal-spec/spec [:cat "*" ::x-then-y "*"]))

(s/valid? ::xy-with-stars "*xxx-yy*")
;; => true

(gen/sample (s/gen ::xy-with-stars))
;; => ("*x-y*"
;;     "*xx-y*"
;;     "*x-y*"
;;     "*xxxx-y*"
;;     "*xxx-yyyy*"
;;     "*xxxx-yyy*"
;;     "*xxxxxxx-yyyyy*"
;;     "*xx-yyy*"
;;     "*xxxxx-y*"
;;     "*xxx-yyyy*")

Use with Malli

The ::rm/regal schema provides a wrapper for regal checks.

(require '[malli.core :as m]
         '[malli.error :as me]
         '[malli.generator :as mg]
         '[lambdaisland.regal.malli :as rm]
         '[lambdaisland.regal.malli.generator :as rmg])

(def malli-opts {:registry {::rm/regal rm/rm-regal-schema}})

(def schema (m/schema [::rm/regal [:+ "y"]] malli-opts))

(m/form schema)
;; => [::rm/regal [:+ "y"]]

(m/type schema)
;; => ::rm/regal

(m/validate schema "yyy")
;; => true

(me/humanize (m/explain schema "xxx"))
;; => ["should match regex"]

(rmg/register-regal-generator) ;; register generator for ::rm/regal schema
(mg/sample schema)
;; => ("y" "y" "y" "y" "yy" "yy" "yyyyy" "yyyyy" "yyyyy" "yyyy")

BYO test.check / spec-alpha

Regal does not declare any dependencies. This lets people who only care about using Regal Expressions to replace normal regexes to require lambdaisland.regal without imposing extra dependencies upon them.

If you want to use lambdaisland.regal.generator you will require org.clojure/test.check. For lambdisland.regal.spec-alpha you will additionally need org.clojure/spec-alpha.

<!-- contributing -->

Contributing

Everyone has a right to submit patches to this projects, and thus become a contributor.

Contributors MUST

Contributors SHOULD

If you submit a pull request that adheres to these rules, then it will almost certainly be merged immediately. However some things may require more consideration. If you add new dependencies, or significantly increase the API surface, then we need to decide if these changes are in line with the project's goals. In this case you can start by writing a pitch, and collecting feedback on it.

* This goes for features too, a feature needs to solve a problem. State the problem it solves, then supply a minimal solution.

** As long as this project has not seen a public release (i.e. is not on Clojars) we may still consider making breaking changes, if there is consensus that the changes are justified.

<!-- /contributing -->

Prior Art

License

Copyright © 2020 Arne Brasseur

Licensed under the term of the Mozilla Public License 2.0, see LICENSE.