Awesome
Literate jq+shell Programming with jqmd
jqmd
is a tool for writing well-documented, complex manipulations of YAML or JSON data structures using bash scripting and jq
. It allows you to mix both kinds of code -- plus snippets of YAML or JSON data! -- within one or more markdown documents, making it easier to write scripts that do complex things like generate docker-compose
configurations or manipulate serialized Wordpress options.
jqmd
is implemented as an extension of mdsh
, which means you can extend it to process additional kinds of code blocks by defining functions inside your shell @mdsh
blocks. But you do not need to install mdsh, and you can use jqmd --compile
to make distributable scripts that don't require jqmd or mdsh.
Contents
<!-- toc --> <!-- tocstop -->Installation
If you have basher
on your system, you can install jqmd with basher install bashup/jqmd
; otherwise, just download the jqmd executable, chmod +x
it, and put it in a directory on your PATH
.
Usage
Running jqmd some-document.md args...
will read and interpret unindented, triple-backquote fenced code blocks from some-document.md
, according to the language listed on the block:
-
shell
-- interpreted as bash code, executed immediately. Shell blocks can invoke various jqmd functions as described later in this document. -
jq
-- jq code, which is added to a jq filter pipeline for execution at the end of the file, or to be run explicitly with theRUN_JQ
function. Blocks written in jq can also be tagged with@func
to turn them into shell functions instead of executing them immediately; see the section below on reusable blocks for more details. -
jq defs
-- jq function definitions, which are accumulated over the course of the program run, and included at the start of any executed filter pipelines -
jq imports
-- jq module includes or imports, which are accumulated over the course of the program run, and included at the start of any executed filter pipelines (before the current set ofjq defs
). -
yaml
,json
-- YAML data or JSON expressions, which are added to the jq filter pipeline asjqmd_data(data)
. (Which turns the given data into a jq filter to modify an existing data structure; see Data Merging, below for more details). Data blocks can also be tagged with@func
and!const
to turn them into shell functions or JQ constants instead of executing them immediately; see the sections below on resusable blocks and named constants for more details.(Note: YAML data can only be processed if there is a
yaml2json
executable onPATH
, the systempython
interpreter has PyYAML installed, or yaml2json.php is installed; otherwise an error will occur. (For best performance, we recommend installing a tool like this yaml2json written in Go, as its process startup time alone is considerably smaller than that of Python or PHP.)Both YAML and JSON blocks can contain jq string interpolation expressions, denoted by
\( )
. For example, a JSON block containing{ "foo": "\(env.BAR)"}
will cause jq to insert the contents of the environment variableBAR
into the data structure at the appropriate point. (Note that this means that if you have a backslash before a(
in your YAML blocks and you don't want it to be treated as interpolation, you will need to add an extra backslash in front of it.)(In addition,
json
blocks do not have to be valid JSON: they can actually contain arbitrary jq expressions. The only real difference between ajson
block and ajq
block is that a JSON block is automatically wrapped in a call tojqmd_data()
.)
(As with mdsh
, you can extend the above list by defining appropriate hook functions in shell @mdsh
blocks; see the section below on "Supporting Additional Languages" for more info.)
Once all blocks have been executed or added to the filter pipeline, jq is run on standard input with the built-up filter pipeline, if any. (If the filtering pipeline is empty, jq is not run.) Filter pipeline elements are automatically separated with |
, so you should not include a |
at the beginning or end of your jq
blocks or APPLY
/ FILTER
code.
As with mdsh
, you can optionally make a markdown file directly executable by giving it a shebang line such as #!/usr/bin/env jqmd
, or use a shelldown header to make it executable, sourceable, and pretty. :) A sample shelldown header for jqmd might look like:
#!/usr/bin/env bash
: '
<!-- ex: set ft=markdown : '; eval "$(jqmd --eval "$BASH_SOURCE")" # -->
# My Awesome Script
...markdown and code start here...
Also as with mdsh
, you can run jqmd --compile
to output a bash version of your script, with no external dependencies (other than jq and maybe yaml2json
or PyYAML). jqmd --compile
and jqmd --eval
both inject the necessary jqmd runtime functions into the script so that it will work on systems without jqmd installed. (Note that unless your script uses the YAML
or yaml2json
functions at runtime, your script's users will not need it installed.)
(If you'd like more information on compiling, sourcing, and shelldown headers, feel free to have a look at the mdsh docs!)
Data Merging
In a jqmd program, one is often incrementally defining some sort of data structure (such as, e.g. a docker-compose project specification, or a set of Wordpress options). While jq expressions can be used directly to manipulate such a data structure, a more intuitive way to express such data structures is as a series of JSON or YAML blocks that are combined in some way. For this reason, jqmd defines an intuitive data structure merging function to apply such data blocks to an existing data structure. This merging function is exposed to jqmd programs as jqmd::data($data)
, and is used by default to merge JSON and YAML data. The merge algorithm is as follows:
- If
.
is an array, add$data
to it (concatenating if$data
is also an array, otherwise appending) - If
.
and$data
are both objects, recursively merge their values using this same algorithm - In all other cases, return
$data
For most programs, this algorithm is sufficient to do most incremental data structure creation. If you have different needs, however, you can define a jqmd_data
function of your own: JSON and YAML data are wrapped with a call to jqmd_data
, but the default jqmd_data
just calls jqmd::data
.
If you want to override the data merging for all data as of the start of the filter chain, you define a jqmd_data
function in a DEFINE
call or a jq defs
block. Or, you can override it for just a few filters or blocks by defining it in an APPLY
or FILTER
call or jq
block. Afterwards, you can restore the original data merging algorithm like this:
FILTER 'def jqmd_data($data): jqmd::data($data) ; .'
Reusable Blocks
Normally, code or data blocks are executed immediately, at the point they appear in the document. But for more complex scripts or libraries, this is a bit limiting. So jqmd allows you to turn blocks into shell functions, so they can be called more than once (or not at all), possibly with parameters. For example, the following markdown:
```jq @func setElement key="$1" @val="$2"
.[$key] = $val
```
```yaml @func mksite SITE WP_HOME
services:
\($SITE):
environment:
WP_HOME: \($WP_HOME)
```
...expands into the following two shell functions:
function setElement() {
APPLY $'.[$key] = $val\n' \
key="$1" @val="$2"
}
function mksite() {
APPLY $'jqmd_data({"services":{"\\($SITE)":{"environment":{"WP_HOME":"\\($WP_HOME)"}}}})\n' \
SITE WP_HOME
}
Everything after the @func name
part of the block opener becomes arguments to APPLY
, which maps shell variables or other values to jq variables with the specified names. An @
before an argument name means, "this variable or value is already JSON-encoded", and the absence of an =
means "create a jq variable with the same name and value as this shell or environment variable". (Note: values after =
should be quoted as shown above if they contain variables or shell parameters like $1
.)
So, our example setElement
function takes two positional arguments and sets a key (given as a string) to a value (given as JSON data). So e.g. setElement foo 42
would be equivalent to the jq expression .foo = 42
.
The second example function, mksite
, sets the WP_HOME
for a docker-compose service named $SITE
with the current contents of $SITE
and $WP_HOME
. (Unlike normal docker-compose string interpolation -- which can only use one value for an environment variable -- this function can be called several times with different SITE
and WP_HOME
values to build up configuration for mutliple containers.)
These are just a few examples of what you can do with reusable @func
blocks. @func
can only be used with json
, yaml
, or jq
blocks. jq
and json
blocks can refer directly to parameter variables, while yaml
blocks can only use string interpolation (\( $var )
) to insert string keys or values. jq
blocks are applied as-is, while json
and yaml
blocks are wrapped in a call to jqmd_data()
(as described in Data Merging, above).
Named Constants
Data blocks can also be tagged as "named constants": a code block starting with e.g. ```yaml !const foo
will have its contents defined as a zero-argument jq function named foo
.
That is, the following two code blocks do the exact same thing:
```jq defs
def pi: 3.14159;
```
```json !const pi
3.14159
```
Programming Models
jqmd
supports developing three types of programs: filters, scripts, and extensions. The main differences are that:
- Filters typically run jq once, implicitly, at the end of the document, sending the output to stdout,
- Scripts explicitly run jq multiple times or not at all, and
- Extensions are shell scripts written using
jqmd
functions to create different markdown processing and/or jq support tools.
Filters
Filters are programs that build up a single giant jq pipeline, and then act as a filter, typically taking JSON input from stdin and sending the result to stdout. If your markdown document defines at least one filter, and doesn't use RUN_JQ
or CLEAR_FILTERS
to reset the pipeline, it's a filter. jqmd
will automatically run jq
to do the filtering from stdin to stdout, after the entire markdown document has been processed. If you don't want jq to read from stdin, you can use JQ_OPTS -n
within your script to start the filter pipeline without any file input. (Similarly, you can use JQ_OPTS -- somefile
to force jq to read input from a specific file instead of stdin.)
Scripts
If your program isn't a filter, it's probably a script. Scripts can run jq with shared imports, functions, and arguments, using the RUN_JQ
function. (They must not add anything to the filter pipeline after the last RUN_JQ
or CLEAR_FILTERS
call, though, or jqmd
will think the program's a filter!)
You'll generally use this approach if your script needs to run jq multiple times with different inputs and filters. Each time a script uses the CLEAR_FILTERS
or RUN_JQ
functions, the filter pipeline is reset to empty and can then be built up again to run different operations.
(Note: unlike the filter pipeline, jq options, arguments, imports, and defintions are cumulative. They can only be added to as the program executes, and cannot be reset. Thus, they are shared across all invocations of RUN_JQ
. So anything specific to a given run of jq should be specified as a filter, or passed as an explicit command-line argument to RUN_JQ
.)
Extensions
jqmd
itself can be extended by other shell scripts, to make more-specialized tools or custom interpreters. Sourcing jqmd
from a bash script will define all its functions, but not actually run a program. In this way, you can use all of the available functions described below (plus any of mdsh
's underlying API) in a shell script, rather than a markdown file. (You can also use or redefine jqmd and mdsh's internal functions, but those not documented here or in the mdsh documentation are subject to change without notice!)
If you are sourcing jqmd
(whether it's to write an extension or reuse its functions), you should also read the mdsh docs, since jqmd is an extension of mdsh.
Available Functions
Within shell
blocks, many functions are available for your use. When passing jq
code to them, it's best to use single quotes to avoid unwanted interpretation of $ variables or other quoting issues, e.g.:
DEFINE '
def recursive_add($other): . as $original |
reduce paths(type=="array") as $path (
(. // {}) * $other; setpath( $path; ($original | getpath($path)) + ($other | getpath($path)) )
);
'
DEFINE 'def jqmd_data($arg): recursive_add($arg);'
Adding jq Code and Data
-
APPLY
expr [@
]name[=
value]... -- add expr to the jq filter pipeline, with the named jq variables bound to the specified values or the value of the corresponding shell variable. If expr is the empty string or.
, the variables can be used by the entire filter chain past this point; otherwise they are only visible within expr.Each name must be a valid jq variable name (minus the leading
$
). If the=
value is omitted, the value of the shell variable name is used. By default, the value is received by jq as a string, but if name is prefixed with@
, then the value is interpreted as JSON. So, if you need to pass in a number, boolean, or other value already in JSON format (even a complex data structure) you can use@
to pass it in -- even if it's untrusted user-supplied data. e.g.:APPLY 'some_func($foo; $bar)' @foo=42 @bar="$untrusted_json"
This code will call
some_func(42; $bar)
with jq's$bar
variable set to the arbitrary JSON value from$untrusted_json
, or else abort with an error during the jq run if$untrusted_json
contains invalid JSON. -
IMPORTS
arg -- add the given jqimport
orinclude
statements to a block that will appear at the very beginning of the jq "program". (Each statement must be terminated with;
, as is standard for jq.) Imports are accumulated in the order they are processed, but all imports active as of a given jq run will be placed at the beginning of the overall program, as required by jq syntax.(This function is the programmatic equivalent of including a
jq imports
code block at the current point of execution.) -
DEFINE
arg -- add the given jqdef
statements to a block that will appear after theIMPORTS
, but before any filters. (Each statement must be terminated with;
, as is standard for jq.)This function is the programmatic equivalent of including a
jq defs
code block at the current point of execution.Note: you do not have to define all your functions this way. Functions can also be defined at the beginning of
FILTER
blocks orjq
-tagged code blocks. The main benefits of usingDEFINE
orjq defs
blocks are that:-
They can be done "out of order" within a document: you can use a function in a
jq
orFILTER
block before itsDEFINE
block appears, as long as theDEFINE
happens before jq is actually run. -
In a script that runs jq more than once,
IMPORTS
andDEFINE
blocks persist across jq runs, whilejq
andFILTER
blocks reset after everyRUN_JQ
. -
While a
jq
orFILTER
block has to include a filter expression of some kind (even if it's just.
),DEFINE
blocks can only contain definitions and comments.(Well, technically, you can include filtering expressions in a
DEFINE
block, but it's not recommended, and you would then have to end the block with a|
to get a syntactically-correct jq program.)
-
-
FILTER
expr [args...] -- add the given jq expression to the jq filter pipeline. The expression is automatically prefixed with|
if any filter expressions have already been added to the pipeline. (This function is the programmatic equivalent of including ajq
code block at the current point of execution.)If any arguments are supplied after expr, they are inserted as JSON-quoted strings wherever
%s
appears in it. (SoFILTER "foo(%s; %s)" bar baz
will expand tofoo("bar", "baz")
. In this way, you can insert arbitrary strings into a jq expression, even if they contain characters that must be escaped in JSON.If you are using arguments, expr is interpreted as a bash
printf
format string, which means that you must escape any actual%
signs as%%
, and should be careful with backslashes in it. (If you don't pass any args after the expr, these issues don't apply, as the string is used as-is.)Every
jq
-tagged code block orFILTER
argument must contain a jq expression. Since jq expressions can begin with function definitions, this means that you can begin a filter with function definitions. This can be useful for redefiningjqmd_data
or other functions at various points within your filter pipeline, or to define functions that will only be used for oneRUN_JQ
pipeline.Bear in mind, however, that because a filter block must contain a valid jq expression, you may need to terminate your filter with a
.
if it contains only functions. For example, this bit ofjq
code is a valid filter, because it ends with a.
:# Add as many functions as you like def f1($other): something; def f2: another(thing); # but finish with a '.' to create a no-op filtering expression .
This "end function-only filters with a ." rule applies whether you're using
jq
-tagged code blocks or theFILTER
function. -
JSON
data [args...] -- a shortcut forFILTER "jqmd_data(
data)"
args.... This function is the programmatic equivalent of including ajson
code block at the current point of execution, but it can also include interpolated args, as withFILTER
(and the same rules for%s
and escaping%
apply if you supply any args). -
YAML
data -- a shortcut forFILTER "jqmd_data(
data-converted-to-json)"
. This function is the programmatic equivalent of including ayaml
code block at the current point of execution, and only works if there is ayaml2json
converter onPATH
, the system defaultpython
has PyYAML installed, or yaml2json.php is on the systemPATH
.) -
yaml2json
-- a filter that takes YAML or JSON input, and produces JSON output. The actual implementation is system-dependent, using either ayam2json
command line tool, Python, or PHP, depending on what's available. This can be used to convert data, validate it, or to remove jq expressions from untrusted input.
Notice that JSON and YAML blocks are always filtered through a jqmd_data()
function, which by default does data merging, but you can always redefine the function to do something different, even as part of a FILTER
or jq block. (Just remember that while filters can begin with function definitions, they must each end with an expression, even if it's only a .
.)
Also note that data passed to the JSON
and YAML
functions can contain jq interpolation expressions, which means that you must not pass untrusted data to them. If you need to process a user-supplied JSON string, the simplest way is to use JSON "( %s | fromjson)" "$untrusted_json"
. Alternately, you can call ARGJSON someJQvarname "$untrusted_json"
to create the jq variable $someJQvarname
, and then use it with e.g. JSON '$someJQvarname'
. (Note the single quotes!)
(If your user-supplied data is in YAML form, you can use the same approaches, but must convert it to JSON first.)
JSON Escaping and Data Structures
These functions don't do anything to jq or the filter pipeline; they simply escape, quote, or otherwise format values into JSON, returning the result(s) via REPLY
. You can then use them to build up FILTER
strings, or pipe them to jq as input.
JSON-QUOTE
strings... -- setREPLY
to an array containing the JSON-quoted version of strings. Each element in the resulting array will begin and end with double quotes, and have proper backslash escapes for contained control characters, double quotes, and backslashes.JSON-LIST
strings... -- setREPLY
to a string representing a JSON list of the given strings.JSON-KV
"key=val"... -- setREPLY
to a string representing a JSON object mapping from each given key to a string value. Keys cannot contain=
. If an argument doesn't contain an=
, its value is equal to its key.JSON-MAP
assoc-array -- (bash 4+ only) setREPLY
to a string representing a JSON object containing the contents of the named assoc-arrayescape-ctrl-characters
strings... -- setREPLY
to an array containing strings with control characters escaped as\n
,\t
,\r
, or\uXXXX
. This function is used internally by the otherJSON-x
functions when their argument(s) contain control characters.
Adding jq Options and Arguments
JQ_OPTS
opts... -- add opts to the jq command line being built up. Whenever jq is run (either explicitly usingRUN_JQ
orCALL_JQ
, or implicitly at the end of the document), the given options will be part of the command line.ARG
name value -- define a jq variable named$
name, with the supplied string value. (Shortcut forJQ_OPTS --arg name value
.)ARGJSON
name json-value -- define a jq variable named$
name, with the supplied JSON value. (Shortcut forJQ_OPTS --argjson name json
.) This is especially useful for passing the output of other programs or data files as arguments to your jq code, e.g.ARGJSON something "$(wp option get something --format=json)"
.ARGSTR
string andARGVAL
json-value -- these functions work likeARG
andARGJSON
, but instead of you passing in an argument name, a unique argument name is automatically generated, and returned in$REPLY
. The returned string will expand to the passed in-value in any jq expressions.
(Note: the added options will reset to empty again after RUN_JQ
, CALL_JQ
, or CLEAR_FILTERS
.)
Controlling jq Execution
-
RUN_JQ
args... -- invoke$JQ_CMD
(jq
by default) with the currentJQ_OPTS
and given args. If a "program" is given inJQ_OPTS
(i.e., a non-option argument other than--
), it's added to the filter pipeline, after anyIMPORTS
andDEFINE
blocks established so far. Any-f
or--fromfile
options are similarly added to the filter pipeline, and multiple such files are allowed. (Unlike plain jq, which doesn't work properly with multiple-f
options.)After jq is run, the filter pipeline is emptied with
CLEAR_FILTERS
. -
CALL_JQ
args... -- exactly likeRUN_JQ
, except that the output ofjq
is captured into$REPLY
. You should use this instead of shell substitution to capture jq's output. -
CLEAR_FILTERS
-- reset the current filter pipeline andJQ_OPTS
to empty. This can be used at the end of a script to keepjqmd
from running jq on stdin/stdout. -
HAVE_FILTERS
-- succeeds if there is anything in the filter pipeline at the time of excution, fails otherwise. (i.e., you can useif HAVE_FILTERS; then ...
to take action in a script based on the current filter state.
Note: piping into RUN_JQ
or CALL_JQ
, or invoking them in a subshell or shell substituion will not reset the current filter pipeline. To capture jq's output, use CALL_JQ
instead of shell substitution. To pipe input into jq, pass it as a post---
argument to RUN_JQ
or CALL_JQ
, e.g.:
$ echo '"something"' | RUN_JQ . # WRONG: CLEAR_FILTERS won't run
$ RUN_JQ . -- <(echo '"something"') # RIGHT: use process substitution instead of piping
$ foo bar "$(RUN_JQ)" # WRONG: CLEAR_FILTERS won't run
$ CALL_JQ; foo bar "$REPLY" # RIGHT
Command-line Arguments
You can pass additional arguments to jqmd
, after the path to the markdown file. These additional arguments are available as $1
, $2
, etc. within any top-level shell
code in the markdown file.
Supporting Additional Languages
By default, jqmd
only interprets unindented, triple-backquoted markdown blocks tagged as shell
, jq
, jq defs
, jq imports
, yaml
, yml
, or json
. Unindented triple-backquoted blocks with any other tags are interpreted as data and assigned to shell variables, as described in the mdsh docs on data blocks.
As with mdsh
, however, you can define interpreters for other block types by defining mdsh-lang-X
or mdsh-compile-X
functions in shell @mdsh
blocks, via a wrapper script, or as exported functions in your bash environment. (You can also override these functions to change jqmd's default interpretation of jq, YAML, or JSON blocks.)
For more information on how to do this, see the mdsh docs on processing non-shell languages, or consult the mdsh docs in general for more info on what you can do with jqmd.