Home

Awesome

Oniguruma-To-ES (鬼車➑️ES)

npm version npm downloads bundle

An Oniguruma to JavaScript regex transpiler that runs in the browser and on your server. Use it to:

Compared to running the Oniguruma C library via WASM bindings using vscode-oniguruma, this library is less than 4% of the size and its regexes often run much faster since they run as native JavaScript.

Try the demo REPL

Oniguruma-To-ES deeply understands the hundreds of large and small differences between Oniguruma and JavaScript regex syntax and behavior, across multiple JavaScript version targets. It's obsessive about ensuring that the emulated features it supports have exactly the same behavior, even in extreme edge cases. And it's been battle-tested on thousands of real-world Oniguruma regexes used in TextMate grammars (via the Shiki library). A few uncommon features can't be perfectly emulated and allow rare differences, but if you don't want to allow this, you can set the accuracy option to throw for such patterns (see details below).

πŸ“œ Contents

πŸ•ΉοΈ Install and use

npm install oniguruma-to-es
import {toRegExp} from 'oniguruma-to-es';

const str = '…';
const pattern = '…';
// Works with all string/regexp methods since it returns a native regexp
str.match(toRegExp(pattern));
<details> <summary>Using a global name (no import)</summary>
<script src="https://cdn.jsdelivr.net/npm/oniguruma-to-es/dist/index.min.js"></script>
<script>
  const {toRegExp} = OnigurumaToES;
</script>
</details>

πŸ”‘ API

toRegExp

Accepts an Oniguruma pattern and returns an equivalent JavaScript RegExp.

[!TIP] Try it in the demo REPL.

function toRegExp(
  pattern: string,
  options?: OnigurumaToEsOptions
): RegExp | EmulatedRegExp;

Type OnigurumaToEsOptions

type OnigurumaToEsOptions = {
  accuracy?: 'default' | 'strict';
  avoidSubclass?: boolean;
  flags?: string;
  global?: boolean;
  hasIndices?: boolean;
  maxRecursionDepth?: number | null;
  rules?: {
    allowOrphanBackrefs?: boolean;
    allowUnhandledGAnchors?: boolean;
    asciiWordBoundaries?: boolean;
    captureGroup?: boolean;
  };
  target?: 'auto' | 'ES2025' | 'ES2024' | 'ES2018';
  verbose?: boolean;
};

See Options for more details.

toDetails

Accepts an Oniguruma pattern and returns the details needed to construct an equivalent JavaScript RegExp.

function toDetails(
  pattern: string,
  options?: OnigurumaToEsOptions
): {
  pattern: string;
  flags: string;
  subclass?: EmulatedRegExpOptions;
};

Note that the returned flags might also be different than those provided, as a result of the emulation process. The returned pattern, flags, and subclass properties can be provided as arguments to the EmulatedRegExp constructor to produce the same result as toRegExp.

If the only keys returned are pattern and flags, they can optionally be provided to JavaScript's RegExp constructor instead. Setting option avoidSubclass to true ensures that this is always the case, by throwing an error for any patterns that rely on EmulatedRegExp's additional handling.

toOnigurumaAst

Returns an Oniguruma AST generated from an Oniguruma pattern.

function toOnigurumaAst(
  pattern: string,
  options?: {
    flags?: string;
    rules?: {
      captureGroup?: boolean;
    };
  }
): OnigurumaAst;

EmulatedRegExp

Works the same as JavaScript's native RegExp constructor in all contexts, but can be given results from toDetails to produce the same result as toRegExp.

class EmulatedRegExp extends RegExp {
  constructor(
    pattern: string | EmulatedRegExp,
    flags?: string,
    options?: EmulatedRegExpOptions
  );
};

πŸ”© Options

The following options are shared by functions toRegExp and toDetails.

accuracy

One of 'default' (default) or 'strict'.

Sets the level of emulation rigor/strictness.

<details> <summary>More details</summary>

Using default accuracy adds support for the following features, depending on target:

</details>

avoidSubclass

Default: false.

Disables advanced emulation that relies on returning a RegExp subclass. In cases when a subclass would otherwise have been used, this results in one of the following:

flags

Oniguruma flags; a string with i, m, x, D, S, W in any order (all optional).

Flags can also be specified via modifiers in the pattern.

[!IMPORTANT] Oniguruma and JavaScript both have an m flag but with different meanings. Oniguruma's m is equivalent to JavaScript's s (dotAll).

global

Default: false.

Include JavaScript flag g (global) in the result.

hasIndices

Default: false.

Include JavaScript flag d (hasIndices) in the result.

maxRecursionDepth

Default: 5.

Specifies the recursion depth limit. Supported values are integers 2–100 and null. If null, any use of recursion results in an error.

Since recursion isn't infinite-depth like in Oniguruma, use of recursion also results in an error if using strict accuracy.

<details> <summary>More details</summary>

Using a high limit has a small impact on performance. Generally, this is only a problem if the regex has an existing issue with runaway backtracking that recursion exacerbates. Higher limits have no effect on regexes that don't use recursion, so you should feel free to increase this if helpful.

</details>

rules

Advanced pattern options that override standard error checking and flags when enabled.

target

One of 'auto' (default), 'ES2025', 'ES2024', or 'ES2018'.

JavaScript version used for generated regexes. Using auto detects the best value based on your environment. Later targets allow faster processing, simpler generated source, and support for additional features.

<details> <summary>More details</summary> </details>

verbose

Default: false.

Disables optimizations that simplify the pattern when it doesn't change the meaning.

βœ… Supported features

Following are the supported features by target. The official Oniguruma syntax doc doesn't cover many of the finer details described here.

[!NOTE] Targets ES2024 and ES2025 have the same emulation capabilities. Resulting regexes might have different source and flags, but they match the same strings. See target.

Notice that nearly every feature below has at least subtle differences from JavaScript. Some features listed as unsupported are not emulatable using native JavaScript regexes, but support for others might be added in future versions of this library. Unsupported features throw an error.

<table> <tr> <th colspan="2">Feature</th> <th>Example</th> <th>ES2018</th> <th>ES2024+</th> <th>Subfeatures &amp; JS differences</th> </tr> <tr valign="top"> <th align="left" rowspan="8">Flags</th> <td colspan="5"><i>Supported in top-level flags and pattern modifiers</i></td> </tr> <tr valign="top"> <td>Ignore case</td> <td><code>i</code></td> <td align="middle">βœ…</td> <td align="middle">βœ…</td> <td> βœ” Unicode case folding (same as JS with flag <code>u</code>, <code>v</code>)<br> </td> </tr> <tr valign="top"> <td>Dot all</td> <td><code>m</code></td> <td align="middle">βœ…</td> <td align="middle">βœ…</td> <td> βœ” Equivalent to JS flag <code>s</code><br> </td> </tr> <tr valign="top"> <td>Extended</td> <td><code>x</code></td> <td align="middle">βœ…</td> <td align="middle">βœ…</td> <td> βœ” Unicode whitespace ignored<br> βœ” Line comments with <code>#</code><br> βœ” Whitespace/comments allowed between a token and its quantifier<br> βœ” Whitespace/comments between a quantifier and the <code>?</code>/<code>+</code> that makes it lazy/possessive changes it to a quantifier chain<br> βœ” Whitespace/comments separate tokens (ex: <code>\1 0</code>)<br> βœ” Whitespace and <code>#</code> not ignored in char classes<br> </td> </tr> <tr valign="top"> <td colspan="5"><i>Currently supported only in top-level flags</i></td> </tr> <tr valign="top"> <td>Digit is ASCII</td> <td><code>D</code></td> <td align="middle">βœ…</td> <td align="middle">βœ…</td> <td> βœ” ASCII <code>\d</code>, <code>\p{Digit}</code>, <code>[[:digit:]]</code><br> </td> </tr> <tr valign="top"> <td>Space is ASCII</td> <td><code>S</code></td> <td align="middle">βœ…</td> <td align="middle">βœ…</td> <td> βœ” ASCII <code>\s</code>, <code>\p{Space}</code>, <code>[[:space:]]</code><br> </td> </tr> <tr valign="top"> <td>Word is ASCII</td> <td><code>W</code></td> <td align="middle">βœ…</td> <td align="middle">βœ…</td> <td> βœ” ASCII <code>\b</code>, <code>\w</code>, <code>\p{Word}</code>, <code>[[:word:]]</code><br> </td> </tr> <tr valign="top"> <th align="left" rowspan="2" valign="top">Pattern modifiers</th> <td>Group</td> <td><code>(?im-x:…)</code></td> <td align="middle">βœ…</td> <td align="middle">βœ…</td> <td> βœ” Unicode case folding for <code>i</code><br> βœ” Allows enabling and disabling the same flag (priority: disable)<br> βœ” Allows lone or multiple <code>-</code><br> </td> </tr> <tr valign="top"> <td>Directive</td> <td><code>(?im-x)</code></td> <td align="middle">βœ…</td> <td align="middle">βœ…</td> <td> βœ” Continues until end of pattern or group (spanning alternatives)<br> </td> </tr> <tr valign="top"> <th align="left" rowspan="9">Characters</th> <td>Literal</td> <td><code>E</code>, <code>!</code></td> <td align="middle">βœ…</td> <td align="middle">βœ…</td> <td> βœ” Code point based matching (same as JS with flag <code>u</code>, <code>v</code>)<br> βœ” Standalone <code>]</code>, <code>{</code>, <code>}</code> don't require escaping<br> </td> </tr> <tr valign="top"> <td>Identity escape</td> <td><code>\E</code>, <code>\!</code></td> <td align="middle">βœ…</td> <td align="middle">βœ…</td> <td> βœ” Different set than JS<br> βœ” Allows multibyte chars<br> </td> </tr> <tr valign="top"> <td>Escaped metachar</td> <td><code>\\</code>, <code>\.</cpde></td> <td align="middle">βœ…</td> <td align="middle">βœ…</td> <td> βœ” Same as JS<br> </td> </tr> <tr valign="top"> <td>Control code escape</td> <td><code>\t</code></td> <td align="middle">βœ…</td> <td align="middle">βœ…</td> <td> βœ” The JS set plus <code>\a</code>, <code>\e</code><br> </td> </tr> <tr valign="top"> <td><code>\xNN</code></td> <td><code>\x7F</code></td> <td align="middle">βœ…</td> <td align="middle">βœ…</td> <td> βœ” Allows 1 hex digit<br> βœ” Above <code>7F</code>, is UTF-8 encoded byte (β‰  JS)<br> βœ” Error for invalid encoded bytes<br> </td> </tr> <tr valign="top"> <td><code>\uNNNN</code></td> <td><code>\uFFFF</code></td> <td align="middle">βœ…</td> <td align="middle">βœ…</td> <td> βœ” Same as JS with flag <code>u</code>, <code>v</code><br> </td> </tr> <tr valign="top"> <td><code>\x{…}</code></td> <td><code>\x{A}</code></td> <td align="middle">βœ…</td> <td align="middle">βœ…</td> <td> βœ” Allows leading 0s up to 8 total hex digits<br> </td> </tr> <tr valign="top"> <td>Escaped num</td> <td><code>\20</code></td> <td align="middle">βœ…</td> <td align="middle">βœ…</td> <td> βœ” Can be backref, error, null, octal, identity escape, or any of these combined with literal digits, based on complex rules that differ from JS<br> βœ” Always handles escaped single digit 1-9 outside char class as backref<br> βœ” Allows null with 1-3 0s<br> βœ” Error for octal > <code>177</code><br> </td> </tr> <tr valign="top"> <td>Caret notation</td> <td><code>\cA</code>, <code>\C-A</code></td> <td align="middle">βœ…</td> <td align="middle">βœ…</td> <td> βœ” With A-Za-z (JS: only <code>\c</code> form)<br> </td> </tr> <tr valign="top"> <th align="left" rowspan="8">Character sets</th> <td>Digit</td> <td><code>\d</code>, <code>\D</code></td> <td align="middle">βœ…</td> <td align="middle">βœ…</td> <td> βœ” Unicode by default (β‰  JS)<br> </td> </tr> <tr valign="top"> <td>Hex digit</td> <td><code>\h</code>, <code>\H</code></td> <td align="middle">βœ…</td> <td align="middle">βœ…</td> <td> βœ” ASCII<br> </td> </tr> <tr valign="top"> <td>Whitespace</td> <td><code>\s</code>, <code>\S</code></td> <td align="middle">βœ…</td> <td align="middle">βœ…</td> <td> βœ” Unicode by default<br> βœ” No JS adjustments to Unicode set (βˆ’<code>\uFEFF</code>, +<code>\x85</code>)<br> </td> </tr> <tr valign="top"> <td>Word</td> <td><code>\w</code>, <code>\W</code></td> <td align="middle">βœ…</td> <td align="middle">βœ…</td> <td> βœ” Unicode by default (β‰  JS)<br> </td> </tr> <tr valign="top"> <td>Dot</td> <td><code>.</code></td> <td align="middle">βœ…</td> <td align="middle">βœ…</td> <td> βœ” Excludes only <code>\n</code> (β‰  JS)<br> </td> </tr> <tr valign="top"> <td>Any</td> <td><code>\O</code></td> <td align="middle">βœ…</td> <td align="middle">βœ…</td> <td> βœ” Any char (with any flags)<br> βœ” Identity escape in char class<br> </td> </tr> <tr valign="top"> <td>Not newline</td> <td><code>\N</code></td> <td align="middle">βœ…</td> <td align="middle">βœ…</td> <td> βœ” Identity escape in char class<br> </td> </tr> <tr valign="top"> <td>Unicode property</td> <td> <code>\p{L}</code>,<br> <code>\P{L}</code> </td> <td align="middle">βœ…</td> <td align="middle">βœ…</td> <td> βœ” Binary properties<br> βœ” Categories<br> βœ” Scripts<br> βœ” Aliases<br> βœ” POSIX properties<br> βœ” Invert with <code>\p{^…}</code>, <code>\P{^…}</code><br> βœ” Insignificant spaces, underscores, and casing in names<br> βœ” <code>\p</code>, <code>\P</code> without <code>{</code> is an identity escape<br> βœ” Error for key prefixes<br> βœ” Error for props of strings<br> ❌ Blocks (wontfix<sup>[1]</sup>)<br> </td> </tr> <tr valign="top"> <th align="left" rowspan="2">Variable-length sets</th> <td>Newline</td> <td><code>\R</code></td> <td align="middle">βœ…</td> <td align="middle">βœ…</td> <td> βœ” Matched atomically<br> </td> </tr> <tr valign="top"> <td>Grapheme</td> <td><code>\X</code></td> <td align="middle">β˜‘οΈ</td> <td align="middle">β˜‘οΈ</td> <td> ● Uses a close approximation<br> βœ” Matched atomically<br> </td> </tr> <tr valign="top"> <th align="left" rowspan="6">Character classes</th> <td>Base</td> <td><code>[…]</code>, <code>[^…]</code></td> <td align="middle">βœ…</td> <td align="middle">βœ…</td> <td> βœ” Unescaped <code>-</code> outside of range is literal in some contexts (different than JS rules in any mode)<br> βœ” Error for unescaped <code>[</code> that doesn't form nested class<br> βœ” Leading unescaped <code>]</code> OK<br> βœ” Fewer chars require escaping than JS<br> </td> </tr> <tr valign="top"> <td>Empty</td> <td><code>[]</code>, <code>[^]</code></td> <td align="middle">βœ…</td> <td align="middle">βœ…</td> <td> βœ” Error<br> </td> </tr> <tr valign="top"> <td>Range</td> <td><code>[a-z]</code></td> <td align="middle">βœ…</td> <td align="middle">βœ…</td> <td> βœ” Same as JS with flag <code>u</code>, <code>v</code><br> </td> </tr> <tr valign="top"> <td>POSIX class</td> <td> <code>[[:word:]]</code>,<br> <code>[[:^word:]]</code> </td> <td align="middle">β˜‘οΈ<sup>[2]</sup></td> <td align="middle">βœ…</td> <td> βœ” All use Unicode definitions<br> </td> </tr> <tr valign="top"> <td>Nested class</td> <td><code>[…[…]]</code></td> <td align="middle">β˜‘οΈ<sup>[3]</sup></td> <td align="middle">βœ…</td> <td> βœ” Same as JS with flag <code>v</code><br> </td> </tr> <tr valign="top"> <td>Intersection</td> <td><code>[…&amp;&amp;…]</code></td> <td align="middle">❌</td> <td align="middle">βœ…</td> <td> βœ” Doesn't require nested classes for intersection of union and ranges<br> </td> </tr> <tr valign="top"> <th align="left" rowspan="6">Assertions</th> <td>Line start, end</td> <td><code>^</code>, <code>$</code></td> <td align="middle">βœ…</td> <td align="middle">βœ…</td> <td> βœ” Always "multiline"<br> βœ” Only <code>\n</code> as newline<br> βœ” <code>^</code> doesn't match after string-terminating <code>\n</code><br> </td> </tr> <tr valign="top"> <td>String start, end</td> <td><code>\A</code>, <code>\z</code></td> <td align="middle">βœ…</td> <td align="middle">βœ…</td> <td> βœ” Same as JS <code>^</code> <code>$</code> without JS flag <code>m</code><br> </td> </tr> <tr valign="top"> <td>String end or before terminating newline</td> <td><code>\Z</code></td> <td align="middle">βœ…</td> <td align="middle">βœ…</td> <td> βœ” Only <code>\n</code> as newline<br> </td> </tr> <tr valign="top"> <td>Search start</td> <td><code>\G</code></td> <td align="middle">β˜‘οΈ</td> <td align="middle">β˜‘οΈ</td> <td> ● Common uses supported<br> </td> </tr> <tr valign="top"> <td>Lookaround</td> <td> <code>(?=…)</code>,<br> <code>(?!…)</code>,<br> <code>(?&lt;=…)</code>,<br> <code>(?&lt;!…)</code> </td> <td align="middle">βœ…</td> <td align="middle">βœ…</td> <td> βœ” Same as JS<br> βœ” Allows variable-length quantifiers and alternation within lookbehind<br> </td> </tr> <tr valign="top"> <td>Word boundary</td> <td><code>\b</code>, <code>\B</code></td> <td align="middle">βœ…</td> <td align="middle">βœ…</td> <td> βœ” Unicode based (β‰  JS)<br> </td> </tr> <tr valign="top"> <th align="left" rowspan="3">Quantifiers</th> <td>Greedy, lazy</td> <td><code>*</code>, <code>+?</code>, <code>{2,}</code>, etc.</td> <td align="middle">βœ…</td> <td align="middle">βœ…</td> <td> βœ” Includes all JS forms<br> βœ” Adds <code>{,n}</code> for min 0<br> βœ” Explicit bounds have upper limit of 100,000 (unlimited in JS)<br> βœ” Error with assertions (same as JS with flag <code>u</code>, <code>v</code>)<br> </td> </tr> <tr valign="top"> <td>Possessive</td> <td><code>?+</code>, <code>*+</code>, <code>++</code></td> <td align="middle">βœ…</td> <td align="middle">βœ…</td> <td> βœ” <code>+</code> suffix doesn't make <code>{…}</code> interval quantifiers possessive (creates a quantifier chain)<br> </td> </tr> <tr valign="top"> <td>Chained</td> <td><code>**</code>, <code>??+*</code>, <code>{2,3}+</code>, etc.</td> <td align="middle">βœ…</td> <td align="middle">βœ…</td> <td> βœ” Further repeats the preceding repetition<br> </td> </tr> <tr valign="top"> <th align="left" rowspan="4">Groups</th> <td>Noncapturing</td> <td><code>(?:…)</code></td> <td align="middle">βœ…</td> <td align="middle">βœ…</td> <td> βœ” Same as JS<br> </td> </tr> <tr valign="top"> <td>Atomic</td> <td><code>(?>…)</code></td> <td align="middle">βœ…</td> <td align="middle">βœ…</td> <td> βœ” Supported<br> </td> </tr> <tr valign="top"> <td>Capturing</td> <td><code>(…)</code></td> <td align="middle">βœ…</td> <td align="middle">βœ…</td> <td> βœ” Is noncapturing if named capture present<br> </td> </tr> <tr valign="top"> <td>Named capturing</td> <td> <code>(?&lt;a>…)</code>,<br> <code>(?'a'…)</code> </td> <td align="middle">βœ…</td> <td align="middle">βœ…</td> <td> βœ” Duplicate names allowed (including within the same alternation path) unless directly referenced by a subroutine<br> βœ” Error for names invalid in Oniguruma or JS<br> </td> </tr> <tr valign="top"> <th align="left" rowspan="4">Backreferences</th> <td>Numbered</td> <td><code>\1</code></td> <td align="middle">βœ…</td> <td align="middle">βœ…</td> <td> βœ” Error if named capture used<br> βœ” Refs the most recent of a capture/subroutine set<br> </td> </tr> <tr valign="top"> <td>Enclosed numbered, relative</td> <td> <code>\k&lt;1></code>,<br> <code>\k'1'</code>,<br> <code>\k&lt;-1></code>,<br> <code>\k'-1'</code> </td> <td align="middle">βœ…</td> <td align="middle">βœ…</td> <td> βœ” Error if named capture used<br> βœ” Allows leading 0s<br> βœ” Refs the most recent of a capture/subroutine set<br> βœ” <code>\k</code> without <code>&lt;</code> <code>'</code> is an identity escape<br> </td> </tr> <tr valign="top"> <td>Named</td> <td> <code>\k&lt;a></code>,<br> <code>\k'a'</code> </td> <td align="middle">βœ…</td> <td align="middle">βœ…</td> <td> βœ” For duplicate group names, rematch any of their matches (multiplex)<br> βœ” Refs the most recent of a capture/subroutine set (no multiplex)<br> βœ” Combination of multiplex and most recent of capture/subroutine set if duplicate name is indirectly created by a subroutine<br> </td> </tr> <tr valign="top"> <td colspan="2">To nonparticipating groups</td> <td align="middle">β˜‘οΈ</td> <td align="middle">β˜‘οΈ</td> <td> βœ” Error if group to the right<sup>[4]</sup><br> βœ” Duplicate names (and subroutines) to the right not included in multiplex<br> βœ” Fail to match (or don't include in multiplex) ancestor groups and groups in preceding alternation paths<br> ❌ Some rare cases are indeterminable at compile time and use the JS behavior of matching an empty string<br> </td> </tr> <tr valign="top"> <th align="left" rowspan="2">Subroutines</th> <td>Numbered, relative</td> <td> <code>\g&lt;1></code>,<br> <code>\g'1'</code>,<br> <code>\g&lt;-1></code>,<br> <code>\g'-1'</code>,<br> <code>\g&lt;+1></code>,<br> <code>\g'+1'</code> </td> <td align="middle">βœ…</td> <td align="middle">βœ…</td> <td> βœ” Allowed before reffed group<br> βœ” Can be nested (any depth)<br> βœ” Doesn't alter backref nums<br> βœ” Reuses flags from the reffed group (ignores local flags)<br> βœ” Replaces most recent captured values (for backrefs)<br> βœ” <code>\g</code> without <code>&lt;</code> <code>'</code> is an identity escape<br> βœ” Error if named capture used<br> </td> </tr> <tr valign="top"> <td>Named</td> <td> <code>\g&lt;a></code>,<br> <code>\g'a'</code> </td> <td align="middle">βœ…</td> <td align="middle">βœ…</td> <td> ● Same behavior as numbered<br> βœ” Error if reffed group uses duplicate name<br> </td> </tr> <tr valign="top"> <th align="left" rowspan="2">Recursion</th> <td>Full pattern</td> <td> <code>\g&lt;0></code>,<br> <code>\g'0'</code> </td> <td align="middle">β˜‘οΈ</td> <td align="middle">β˜‘οΈ</td> <td> ● Has depth limit<sup>[5]</sup><br> </td> </tr> <tr valign="top"> <td>Numbered, relative, named</td> <td> <code>(…\g&lt;1>?…)</code>,<br> <code>(…\g&lt;-1>?…)</code>,<br> <code>(?&lt;a>…\g&lt;a>?…)</code>, etc. </td> <td align="middle">β˜‘οΈ</td> <td align="middle">β˜‘οΈ</td> <td> ● Has depth limit<sup>[5]</sup><br> </td> </tr> <tr valign="top"> <th align="left" rowspan="5">Other</th> <td>Comment group</td> <td><code>(?#…)</code></td> <td align="middle">βœ…</td> <td align="middle">βœ…</td> <td> βœ” Allows escaping <code>\)</code>, <code>\\</code><br> βœ” Comments allowed between a token and its quantifier<br> βœ” Comments between a quantifier and the <code>?</code>/<code>+</code> that makes it lazy/possessive changes it to a quantifier chain<br> </td> </tr> <tr valign="top"> <td>Alternation</td> <td><code>…|…</code></td> <td align="middle">βœ…</td> <td align="middle">βœ…</td> <td> βœ” Same as JS<br> </td> </tr> <tr valign="top"> <td>Keep</td> <td><code>\K</code></td> <td align="middle">β˜‘οΈ</td> <td align="middle">β˜‘οΈ</td> <td> ● Supported if at top level and no top-level alternation is used<br> </td> </tr> <tr valign="top"> <td colspan="2">JS features unknown to Oniguruma are handled using Oniguruma syntax</td> <td align="middle">βœ…</td> <td align="middle">βœ…</td> <td> βœ” <code>\u{…}</code> is an error<br> βœ” <code>[\q{…}]</code> matches <code>q</code>, etc.<br> βœ” <code>[a--b]</code> includes the invalid reversed range <code>a</code> to <code>-</code><br> </td> </tr> <tr valign="top"> <td colspan="2">Invalid Oniguruma syntax</td> <td align="middle">βœ…</td> <td align="middle">βœ…</td> <td> βœ” Error<br> </td> </tr> <tr valign="top"> <th align="left" rowspan="1">Compile-time options</th> <td colspan="2"><code>ONIG_OPTION_CAPTURE_GROUP</code></td> <td align="middle">βœ…</td> <td align="middle">βœ…</td> <td> βœ” Unnamed captures and numbered calls allowed when using named capture<br> </td> </tr> </table>

The table above doesn't include all aspects that Oniguruma-To-ES emulates (including error handling, most aspects that work the same as in JavaScript, and many aspects of non-JavaScript features that work the same in the other regex flavors that support them).

Footnotes

  1. Unicode blocks (which in Oniguruma are used with an In… prefix) are easily emulatable but their character data would significantly increase library weight. They're also a flawed and arguably unuseful feature, given the ability to use Unicode scripts and other properties.
  2. With target ES2018, the specific POSIX classes [:graph:] and [:print:] use ASCII-based versions rather than the Unicode versions available for target ES2024 and later, and they result in an error if using strict accuracy.
  3. Target ES2018 doesn't support nested negated character classes.
  4. It's not an error for numbered backreferences to come before their referenced group in Oniguruma, but an error is the best path for Oniguruma-To-ES because (1) most placements are mistakes and can never match (based on the Oniguruma behavior for backreferences to nonparticipating groups), (2) erroring matches the behavior of named backreferences, and (3) the edge cases where they're matchable rely on rules for backreference resetting within quantified groups that are different in JavaScript and aren't emulatable. Note that it's not a backreference in the first place if using \10 or higher and not as many capturing groups are defined to the left (it's an octal or identity escape).
  5. The recursion depth limit is specified by option maxRecursionDepth. Overlapping recursions and the use of backreferences when the recursed subpattern contains captures aren't yet supported. Patterns that would error in Oniguruma due to triggering infinite recursion might find a match in Oniguruma-To-ES since recursion is bounded (future versions will detect this and error at transpilation time).

❌ Unsupported features

The following don't yet have any support, and throw errors. They're all infrequently-used features, with most being extremely rare. Note that Oniguruma-To-ES can handle 99.9% of real-world Oniguruma regexes, based on patterns used in a large collection of TextMate grammars.

γŠ—οΈ Unicode / mixed case-sensitivity

Oniguruma-To-ES fully supports mixed case-sensitivity (and handles the Unicode edge cases) regardless of JavaScript target. It also restricts Unicode properties to those supported by Oniguruma and the target JavaScript version.

Oniguruma-To-ES focuses on being lightweight to make it better for use in browsers. This is partly achieved by not including heavyweight Unicode character data, which imposes a couple of minor/rare restrictions:

πŸ‘€ Similar projects

JsRegex transpiles Onigmo regexes to JavaScript (Onigmo is a fork of Oniguruma with mostly shared syntax and behavior). It's written in Ruby and relies on the Regexp::Parser Ruby gem, which means regexes must be pre-transpiled on the server to use them in JavaScript. Note that JsRegex doesn't always translate edge case behavior differences.

🏷️ About

Oniguruma-To-ES was created by Steven Levithan.

If you want to support this project, I'd love your help by contributing improvements, sharing it with others, or sponsoring ongoing development.

Β© 2024–present. MIT License.

<!-- Badges -->