Awesome
ejpet
Matching JSON nodes in Erlang.
What for ?
Kind of regular expression applied to JSON documents.
- Find if a JSON document has some structural properties, and possibly extract some information.
- Useful to extract small data pieces from large JSON documents.
- Efficient filtering of JSON nodes in real time.
Backends for jsone, jsx, jiffy and mochijson2.
Quick start
Obtain ejpet
Add it to your project
Add a dependency to ejpet
and possibly to a supported JSON
codec in your project dependency set.
- With
rebar3
, inrebar.config
file
{deps, [
%% ...
{ejpet, ".*", {git, "git://github.com/nmichel/ejpet.git", {tag, "0.7.0"}},
{jsx, ".*", {git, "https://github.com/talentdeficit/jsx.git", {tag, "v2.8.3"}},
%% ...
]}.
- With
mix
, inmix.exs
file
defmodule MyProject.Mixfile do
use Mix.Project
def project do
[
# ...
deps: deps()
# ...
]
end
defp deps() do
[
# ...
{:ejpet, "~> 0.7.0"},
{:jsx, "~> 2.8"},
# ...
]
end
end
From source
Clone
$ git clone git@github.com:nmichel/ejpet.git
Build
$ cd ejpet
$ ./rebar get-deps
$ make && make test
Start Erlang shell
erl -pz ./ebin ./deps/*/ebin
Start (m)using
Read some JSON data
1> {ok, Data} = file:read_file("./test/channels_list.json").
{ok,<<239,187,191,91,13,10,32,32,32,32,123,13,10,32,32,
32,32,32,32,32,32,34,110,117,109,98,101,...>>}
Decode JSON using, say, jsx (provided you have jsx
in your load path)
2> Node = jsx:decode(Data).
[[{<<"number">>,1},
{<<"lcn">>,2},
{<<"name">>,<<"France 2">>},
{<<"sap_group">>,<<>>},
{<<"ip_multicast">>,<<"239.100.10.1">>},
{<<"port_multicast">>,1234},
{<<"num_clients">>,0},
{<<"scrambling_ratio">>,0},
{<<"is_up">>,1},
{<<"pcr_pid">>,120},
{<<"pmt_version">>,4},
{<<"unicast_port">>,0},
{<<"service_id">>,257},
{<<"service_type">>,
<<"Please report : Unknown service type doc : EN 30"...>>},
{<<"pids_num">>,7},
{<<"pids">>,
...
Ok. Now define what we are looking for, and what we want to get
Find somewhere in a list, an object with
* a {"ip_multicast", "239.100.10.4"} pair
* a key "pcr_pid", whatever value captured in variable "pcr",
* a key "pids", which value is either a list or an object into which there are
* an object with
* a key "language" which value matches regex "^fr",
* a key "number", whatever value captured in variable "apid"
* a key "type", whatever value captured in variable "acodec"
* an object with
* a key "type", which value matches regex "Video" captured in variable "vcodec"
* a key "number", whatever value captured in variable "vpid"
3> O = ejpet:compile("[*, {\"ip_multicast\":\"239.100.10.4\",
\"pcr_pid\":(?<pcr>_),
\"pids\":<{\"language\": #\"^fr\",
\"number\": (?<apid>_),
\"type\": (?<acodec>_)},
{\"type\": (?<vcodec>#\"Video\"),
\"number\": (?<vpid>_)}>}, *]", jsx).
{ejpet,jsx,#Fun<ejpet_jsx_generators.9.11467207>}
Run and seek ...
4> ejpet:run(Node, O).
Here you are !
{true,[{"vpid",520},
{"vcodec",[<<"Video (MPEG2)">>]},
{"acodec",[<<"Audio (MPEG1)">>]},
{"apid",530},
{"pcr",520}]}
How ?
Express what you want to match using a simple expression language.
Expression syntax
pattern | match ? | Notes |
---|---|---|
true | true | |
false | false | |
null | null | |
"string" | the string "string" | UTF-8 encoded string (with escaping) |
#"regex" | any string matching regex "regex" | UTF-8 encoded string (no escaping) |
number | the number number e.g. (42 , 3.14159 , -3395.1264e-22 ) | |
{ kv* } | object for which all kv (key/value) patterns are matched | Order does not matter |
[ item* (, *)?] | list for which all item patterns are matched | Order DOES matter |
< value* > | value set (list, or object values) for which all value patterns are matched | Order does not matter |
< value* >/g | same as previous but search for ALL matches. Useful only when capturing | Order does not matter |
<! value* !> | same as < value* > but search deep. | |
<! value* !>/g | same as previous but search for ALL matches. Useful only when capturing | |
(?<name>expr) | capture expression expr in return value name | Every JSON expression may be captured |
(!<name>type) | match json object of type type against parameter named name |
kv
may be one of the form
- _:pattern
"key"
:_
"key"
:pattern
item
may be one of the form
*,
pattern- pattern
value
is a pattern
kv
, item
and value
are separated by ,
.
In parameter injection type
may be
number
boolean
string
regex
Notes
Numbers
number
matching may be strict or loose, depending on an option passed are compile-time.
1> ejpet:match(<<"42.0">>, "42").
{true,<<"{}">>}
2> ejpet:match(<<"42.0">>, "42", [{number_strict_match, true}]).
{false,<<"{}">>}
Strings and Regex
string
and regex
are UTF-8 encoded byte streams.
They may contain escaping sequences, as in "\\b"
, or "\u00E9"
. When found in a string
these sequences are interpreted by default (but they may be left as-is with option string_apply_escape_sequence
set to false
). Found in regex
they are not interpreted.
3> ejpet:match(<<"\"\x{00E9}\""/utf8>>, <<"\"\\u00E9\""/utf8>>, [{string_apply_escape_sequence, true}]).
{true,<<"{}">>}
4> ejpet:match(<<"\"\x{00E9}\""/utf8>>, <<"\"\\u00E9\""/utf8>>, [{string_apply_escape_sequence, false}]).
{false,<<"{}">>}
5> ejpet:match(<<"\"\\\\u00E9\""/utf8>>, <<"\"\\u00E9\""/utf8>>, [{string_apply_escape_sequence, false}]).
{true,<<"{}">>}
Codepoint produced by evaluating an escape sequence of the form \uABCD
is NOT checked. One can insert any codepoint, valid or not, in a string or regex.
Captures
Every pattern p
can be captured by simply substituing it by (?<variable_name>p)
. Captures are returned as a JSON object, where each variable_name
ìs a key, and the list if captures found for that variable is the value.
This JSON object is build with repect to the backend indicated when compiling the pattern.
Warning : if there is no captures to return, the empty JSON object {}
will be returned. But its actual form depends on the backend.
- jsx:
[{}]
- jiffy:
{[]}
- mochijson:
{struct, []}
- jsone:
#{}
One may wonder why return captures as a encoded JSON object. There is 2 reasons :
- captures objects are captured "as is" in the parsed document, i.e. in their encoded form. Using the backend encoding for the result is more coherent;
- capture JSON object can itself be pattern matched.
Parameters Injection
It is possible to provide some matching values at match-time, through parameter injection forms like (!<param_name>param_type)
, where param_type
may be number
, string
, boolean
and regex
.
At match-time, produced matching functions will look for an entry named param_name
in the provided parameters list. See ejpet:run/3
and ejpet:match/4
.
Note that string
values should be binaries, and regex
values MUST be mp()
opaque objects returned by re:compile/2
.
API
backend() = jsx | jiffy | mochijson2 | jsone
epm() = {ejpet, term(), term()}
expr_src() = string()
compile_option() = {string_apply_escape_sequence, boolean()}
| {number_strict_match, boolean()}
json_input() = string() | binary()
json_src() = binary()
json_term() = jsx_term() | jiffy_term() | mochijson2_term()
run_param_name = binary()
run_param_value = boolean() | number() | binary() | re::mp()
run_param = {run_param_name(), run_param_value()}
run_res() = {match_stat(), json_term()}
match_res() = {match_stat(), json_src()}
match_stat() = true | false
ejpet:decode(JSONText, Backend) -> json_term()
JSONText = json_input()
Backend = backend()
ejpet:encode(JSONTerm, Backend) -> json_term()
JSONTerm = json_term()
Backend = backend()
ejpet:compile(Expr, Backend, Options) -> epm()
Expr = expr_src()
Backend = backend()
Options = [Option]
Option = compile_option()
ejpet:compile(Expr, Backend) -> epm()
Same as ejpet:compile(Expr, Backend, [])
ejpet:compile(Expr) -> epm()
Same as ejpet:compile(Expr, jsx, [])
ejpet:backend(EPM) -> backend()
EPM = epm()
ejpet:run(JSONTerm, EPM, Params) -> run_res()
EPM = epm()
JSONTerm = json_term()
Params = [Param]
Param = run_param()
ejpet:run(JSONTerm, EPM) -> run_res()
Same pas ejpet:run(JSONTerm, EPM, [])
ejpet:match(JSONText, Expr, Options, Params) -> match_res()
JSONText = json_input()
Expr = expr_src() | epm()
Options = [Option]
Option = compile_option()
Params = [Param]
Param = run_param()
ejpet:match(JSONText, Expr, Options) -> match_res()
Same as ejpet:match(JSONText, Expr, Options, [])
ejpet:match(JSONText, Expr) -> match_res()
Same as ejpet:match(JSONText, Expr, [], [])
ejpet:get_status(Res) -> match_stat()
Res = run_res() | match_res()
get_captures(Res) -> json_term()
Res = run_res() | match_res()
get_capture(Res, Name) -> {ok, json_term()} | not_found
Same as get_captures(Res, Name, jsx)
get_capture(Res, Name, Backend) -> {ok, json_term()} | not_found
Res = run_res()
Name = string() | binary()
Backend = backend()
empty_capture_set() -> json_term()
Same as empty_capture_set(jsx)
empty_capture_set(Backend) -> json_term()
Backend = backend()
Examples
Basics
Expression | Match | No match | Code snippet |
---|---|---|---|
42 | 42 | "42" , [42] , {"key": 42} | ejpet:match(<<"42">>, "42"). |
"42" | "42" | 42 , ["42"] , {"key": "42"} | ejpet:match(<<"\"42\"">>, "\"42\""). |
true | true | "true" , [true] | ejpet:match(<<"true">>, "true"). |
false | false | "false" , [false] | ejpet:match(<<"false">>, "false"). |
null | null | "null" , [null] | ejpet:match(<<"null">>, "null"). |
#"foo" | "foobar" , "barfoo" | "barfo" | ejpet:match(<<"\"foobar\"">>, "#\"foo\""). |
#"^foo" | "foobar" | "barfoo" | ejpet:match(<<"\"foobar\"">>, "#\"^foo\""). |
#"bar$" | "foobar" | "barfoo" | ejpet:match(<<"\"foobar\"">>, "#\"bar$\""). |
Objects
Expression | Match | No match | Code snippet |
---|---|---|---|
{_:42} | {"bar": 42} , {"bar": 47, "foo": 42} | {"bar": 47} , {"foo": "42"} | ejpet:match(<<"{\"foo\": 42}">>, "{_:42}"). |
{"foo":_} | {"foo": 42} , {"bar": 42, "foo": {}} | {"bar": "foo"} | ejpet:match(<<"{\"foo\": 42}">>, "{\"foo\":_}"). |
{"foo":42} | {"foo": 42} , {"bar": "42", "foo": 42} | {"bar": 42, "foo": "42"} | ejpet:match(<<"{\"foo\": 42}">>, "{\"foo\":42}"). |
{_:{"foo": 42}, "bar": {_:#"bar"}} | {"neh": {"foo": 42}, "bar": {"nimp": "foobar"}} | {"neh": {"notfoo": 42}, "bar": {"nimp": "foobar"}} | ejpet:match(<<"{\"neh\": {\"foo\": 42}, \"bar\": {\"nimp\": \"foobar\"}}">>, "{_:{\"foo\": 42}, \"bar\": {_:#\"bar\"}}"). |
Lists
Expression | Match | No match | Code snippet |
---|---|---|---|
["42"] | ["42"] | {"bar": "42"} , {"foo": 42} , [42] , ["42", "42"] | ejpet:match(<<"[\"42\"]">>, "[\"42\"]"). |
[*, "42"] | ["42"] , ["42", "42"] , [true, "42"] | {"bar": "42"} , {"foo": 42} , [42] , ["42", true] | ejpet:match(<<"[true, \"42\"]">>, "[*, \"42\"]"). |
[*, "42", *] | ["42"] , ["42", "42"] , [true, "42"] , ["42", true] , [{}, "42", true] | {"bar": "42"} , {"foo": 42} , [42] | ejpet:match(<<"[true, \"42\", {}]">>, "[*, \"42\", *]"). |
[[42]] | [[42]] | [42] , [[42], 42] | ejpet:match(<<"[[42]]">>, "[[42]]"). |
[*, [42]] | [[42]] , ["42", [42]] | [[42], 42] | ejpet:match(<<"[\"42\", [42]]">>, "[*, [42]]"). |
[[42], *] | [[42]] , [[42], 42] | ["42", [42]] | ejpet:match(<<"[[42], \"42\"]">>, "[[42], *]"). |
Value sets (lists or object value set)
Expression | Match | No match | Code snippet |
---|---|---|---|
<42> | [42] , {"key": 42} | 42 , "42" | ejpet:match(<<"{\"key\": 42}">>, "<42>"). |
<"42"> | ["42"] , {"bar": "42"} , [42, "42"] , ["42", 42] | [42] , {"bar": 47} , {"foo": 42} | ejpet:match(<<"{\"bar\": \"42\"}">>, "<\"42\">"). |
<!"42"!> | ["42"] , [true, "42"] , ["foo", ["42", true], {}] , [{}, {"foo": "42"}, true] , {"bar": "42"} , {"bar": {"foo": "42"}} | "42" , {"foo": 42} , [42] | ejpet:match(<<"[true, [null, {\"foo\": \"42\"}, \"bar\"], {}]">>, "<!\"42\"!>"). |
<!<!"42"!>!> | [["42"]] , [{}, {"foo": "42"}, true] , {"bar": {"foo": "42"}} | ["42"] , {"bar": "42"} | ejpet:match(<<"[{\"foo\":\"42\"}]">>, "<!<!\"42\"!>!>"). |
Captures
Expression | Test | Capture(s) | Code snippet |
---|---|---|---|
<!(?<subnode>{_:42})!> | [{"foo": null}, {"foo": 42, "bar": {}}] | subnode: [{"foo":42,"bar":{}}] | ejpet:match(<<"[{\"foo\": null}, {\"foo\": 42, \"bar\": {}}]">>, "<!(?<subnode>{_:42})!>"). |
(?<all><!(?<subnode>{_:42})!>) | [{"foo": null}, {"foo": 42, "bar": {}}] | all: [[{"foo":null},{"foo":42,"bar":{}}]] ,subnode: [{"foo":42,"bar":{}}] | ejpet:match(<<"[{\"foo\": null}, {\"foo\": 42, \"bar\": {}}]">>, "(?<all><!(?<subnode>{_:42})!>)"). |
Global captures
Expression | Test | Capture(s) | Code snippet |
---|---|---|---|
<(?<node>{\"codec\":_, \"lang\":(?<lang>_)})>/g | [{"codec": "audio", "lang": "fr"}, {"codec": "video", "lang": "en"}, {"codec": "foo", "lang": "it"}] | node: [{"codec":"audio","lang":"fr"}, {"codec":"video","lang":"en"}, {"codec":"foo","lang":"it"}] lang: ["fr", "en", "it"] | ejpet:match(<<"[{\"codec\": \"audio\", \"lang\": \"fr\"}, {\"codec\":\"video\", \"lang\": \"en\"}, {\"codec\": \"foo\", \"lang\": \"it\"}]">>, <<"<(?<node>{\"codec\":_, \"lang\":(?<lang>_)})>/g">>) |
Injections
Expression | Test | parameters | Capture(s) | Code snippet |
---|---|---|---|---|
<(?<subnode>(!<what>number))> | [41, 42, 43] | [{<<"what">>, 42}] | subnode: [42] | ejpet:match(<<"[41, 42, 43]">>, "<(?<subnode>(!<what>number))>", [], [{<<"what">>, 42}]). |
Notes
In arrays above, captured values are expressed as "abstract JSON node", for illustration purpose. As explained previously, actual capture result depends on the API function used, and may be:
- serialized JSON nodes (as in the "Code snippet" column), with
ejpet:match()
1> ejpet:match(<<"[{\"foo\": null}, {\"foo\": 42, \"bar\": {}}]">>, "(?<all><!(?<subnode>{_:42})!>)").
{true,<<"{\"all\":[[{\"foo\":null},{\"foo\":42,\"bar\":{}}]],\"subnode\":[{\"foo\":42,\"bar\":{}}]}">>}
- (jsx | jiffy | mochijson2) JSON value, depending on the backend, for easier further processing, with
ejpet:run()
1> JSX = ejpet:compile("(?<all><!(?<subnode>{_:42})!>)", jsx, []).
{ejpet,jsx,#Fun<ejpet_jsx_generators.19.98422695>}
2> ejpet:run((ejpet:backend(JSX)):decode(<<"[{\"foo\": null}, {\"foo\": 42, \"bar\": {}}]">>), JSX).
{true,[{"all",
[[[{<<"foo">>,null}],[{<<"foo">>,42},{<<"bar">>,[{}]}]]]},
{"subnode",[[{<<"foo">>,42},{<<"bar">>,[{}]}]]}]}
39> Mochi = ejpet:compile("(?<all><!(?<subnode>{_:42})!>)", mochijson2, []).
{ejpet,mochijson2,
#Fun<ejpet_mochijson2_generators.19.110863078>}
40> ejpet:run((ejpet:backend(Mochi)):decode(<<"[{\"foo\": null}, {\"foo\": 42, \"bar\": {}}]">>), Mochi).
{true,{struct,[{<<"all">>,
[[{struct,[{<<"foo">>,null}]},
{struct,[{<<"foo">>,42},{<<"bar">>,{struct,[]}}]}]]},
{<<"subnode">>,
[{struct,[{<<"foo">>,42},{<<"bar">>,{struct,[]}}]}]}]}}