Home

Awesome

jtc - cli tool to extract, manipulate and transform source JSON

jtc stand for: JSON transformational chains (used to be JSON test console).

jtc offers a powerful way to select one or multiple elements from a source JSON and apply various actions on the selected elements at once (wrap selected elements into a new JSON, filter in/out, sort elements, update elements, insert new elements, remove, copy, move, compare, transform, swap around and many other operations).

Enhancement requests and/or questions are more than welcome: ldn.softdev@gmail.com

Content:

  1. Short description

  2. Compilation and installation options

  3. Quick-start guide

  4. Complete User Guide

  5. C++ class and interface usage primer

  6. jtc vs jq

Short description

- jtc is a simple yet very powerful and efficient cli utility tool to process and manipulate JSON data

jtc offers following features (a short list of main features):

The walk-path feature is easy to understand - it's only made of 2 kinds of lexemes traversing JSON tree, which could be mixed up in any order:

There's also a 3rd kind of lexemes - directives: they typically facilitate other functions like working with namespaces, controlling walk-path execution, etc; directives are syntactically similar to the search lexemes

All lexemes can be iterable:

A walk-path may have an arbitrary number of lexemes -the tool accepts a virtually unlimited number of walk paths. See below more detailed explanation with examples

Compilation and installation options

For compiling, c++14 (or later) is required. To compile under different platforms:

Following debug related flags could be passed to jtc when compiling:

Linux and MacOS precompiled binaries are available for download

Choose the latest precompiled binary:

Rename the downloaded file and give proper permissions. E.g., for the latest macOS:

mv jtc-macos-64.latest jtc
chmod 754 jtc

Packaged installations:

Installing via MacPorts

On MacOS, you can install jtc via the MacPorts package manager:

$ sudo port selfupdate
$ sudo port install jtc
Installation on Linux distributions

jtc is packaged in the following Linux distributions and can be installed via the package manager.

$ dnf install jtc
$ zypper in jtc

or on Leap 15.0 and later by adding the utilities repository and installing jtc via zypper.

Manual installation:

download jtc-master.zip, unzip it, descend into unzipped folder, compile using an appropriate command, move compiled file into an install location.

here're the example steps for MacOS:

Release Notes

See the latest Release Notes

Quick-start guide:

Consider a following JSON (a mockup of a bookmark container), stored in a file Bookmarks:

{
   "Bookmarks": [
      {
         "children": [
            {
               "children": [
                  { "name": "The New York Times", "stamp": "2017-10-03, 12:05:19", "url": "https://www.nytimes.com/" },
                  { "name": "HuffPost UK", "stamp": "2017-11-23, 12:05:19", "url": "https://www.huffingtonpost.co.uk/" }
               ],
               "name": "News",
               "stamp": "2017-10-02, 12:05:19"
            },
            {
               "children": [
                  { "name": "Digital Photography Review", "stamp": "2017-02-27, 12:05:19", "url": "https://www.dpreview.com/" }
               ],
               "name": "Photography",
               "stamp": "2017-02-27, 12:05:19"
            }
         ],
         "name": "Personal",
         "stamp": "2017-01-22, 12:05:19"
      },
      {
         "children": [
            { "name": "Stack Overflow", "stamp": "2018-05-01, 12:05:19", "url": "https://stackoverflow.com/" },
            { "name": "C++ reference", "stamp": "2018-06-21, 12:05:19", "url": "https://en.cppreference.com/" }
         ],
         "name": "Work",
         "stamp": "2018-03-06, 12:07:29"
      }
   ]
}

1. let's start with a simple thing - list all URLs:

bash $ jtc -w'<url>l:' Bookmarks
"https://www.nytimes.com/"
"https://www.huffingtonpost.co.uk/"
"https://www.dpreview.com/"
"https://stackoverflow.com/"
"https://en.cppreference.com/"

Let's take a look at the walk-path <url>l::

2. dump all bookmark names from the Work folder:

bash $ jtc -w'<Work>[-1][children][:][name]' Bookmarks
"Stack Overflow"
"C++ reference"

Here the walk-path <Work>[-1][children][:][name] is made of following lexemes:

a. <Work>: find within a JSON tree the first occurrence where the JSON string value is matching "Work" exactly
b. [-1]: step up one tier in the JSON tree structure (i.e., address an immediate parent of the found JSON element)
c. [children]: select/address a node whose label is "children" (it'll be a JSON array, at the same tier with Work)
d. [:]: select each node in the array
e. [name]: select/address a node with the label "name"

in order to understand better how the walk-path works, let's run that series of cli in a slow-motion, gradually adding lexemes to the path one by one, perhaps with the option -l to see also the labels (if any) of the selected elements:

bash $ jtc -w'<Work>' -l Bookmarks
"name": "Work"
bash $ jtc -w'<Work>[-1]' -l Bookmarks
{
   "children": [
      {
         "name": "Stack Overflow",
         "stamp": "2018-05-01, 12:05:19",
         "url": "https://stackoverflow.com/"
      },
      {
         "name": "C++ reference",
         "stamp": "2018-06-21, 12:05:19",
         "url": "https://en.cppreference.com/"
      }
   ],
   "name": "Work",
   "stamp": "2018-03-06, 12:07:29"
}
bash $ jtc -w'<Work>[-1][children]' -l Bookmarks
"children": [
   {
      "name": "Stack Overflow",
      "stamp": "2018-05-01, 12:05:19",
      "url": "https://stackoverflow.com/"
   },
   {
      "name": "C++ reference",
      "stamp": "2018-06-21, 12:05:19",
      "url": "https://en.cppreference.com/"
   }
]
bash $ jtc -w'<Work>[-1][children][:]' -l Bookmarks
{
   "name": "Stack Overflow",
   "stamp": "2018-05-01, 12:05:19",
   "url": "https://stackoverflow.com/"
}
{
   "name": "C++ reference",
   "stamp": "2018-06-21, 12:05:19",
   "url": "https://en.cppreference.com/"
}
bash $ jtc -w'<Work>[-1][children][:][name]' -l Bookmarks
"name": "Stack Overflow"
"name": "C++ reference"

B.t.w., a better (a bit faster and more efficient) walk-path achieving the same query would be this:

3. dump all URL's names:

bash $ jtc -w'<url>l:[-1][name]' Bookmarks
"The New York Times"
"HuffPost UK"
"Digital Photography Review"
"Stack Overflow"
"C++ reference"

this walk-path <url>l:[-1][name]:

4. dump all the URLs and their corresponding names, preferably wrap found pairs in JSON array:

bash $ jtc -w'<url>l:' -w'<url>l:[-1][name]' -jl Bookmarks
[
   {
      "name": "The New York Times",
      "url": "https://www.nytimes.com/"
   },
   {
      "name": "HuffPost UK",
      "url": "https://www.huffingtonpost.co.uk/"
   },
   {
      "name": "Digital Photography Review",
      "url": "https://www.dpreview.com/"
   },
   {
      "name": "Stack Overflow",
      "url": "https://stackoverflow.com/"
   },
   {
      "name": "C++ reference",
      "url": "https://en.cppreference.com/"
   }
]

5. Debugging and validating JSON

jtc is extensively debuggable: the more times option -d is passed the more debugs will be produced. Enabling too many debugs might be overwhelming, though one specific case many would find extremely useful - when validating a failing JSON:

bash $ <addressbook-sample.json jtc 
jtc json exception: expected_json_value

If JSON is big, it's desirable to locate the parsing failure point. Passing just one -d let easily spotting the parsing failure point and its locus:

bash $ <addressbook-sample.json jtc -d
.display_opts(), option set[0]: -d (internally imposed: )
.init_inputs(), reading json from <stdin>
.exception_locus_(), ...         }|       ],|       "children": [,],|       "spouse": null|    },|    {|  ...
.exception_spot_(), -------------------------------------------->| (offset: 967)
jtc json parsing exception (<stdin>:967): expected_json_value
bash $ 

Complete User Guide

there's a lot more under the hood of jtc:

Refer to a complete User Guide for further examples and guidelines.

C++ class and interface usage primer

Refer to a Class usage primer document.

jtc vs jq:

jtc was inspired by the complexity of jq interface (and its DSL), aiming to provide users a tool which would let attaining the desired JSON queries in an easier, more feasible and succinct way

utility ideology:

jq is non-idiomatic in a unix way, e.g.: one can write a program in jq language that even has nothing to do with JSON. Most of the requests (if not all) to manipulate JSONs are ad hoc type of tasks, and learning jq's DSL for ad hoc type of tasks is an overkill (that purpose is best facilitated with GPL, e.g.: Python).
The number of asks on the stackoverflow to facilitate even simple queries for jq is huge - that's the proof in itself that for many people feasibility of attaining their asks with jq is a way too low, hence they default to posting their questions on the forum.

jtc on the other hand is a utility (not a language), which employs a novel but powerful concept, which "embeds" the ask right into the walk-path. That facilitates a much higher feasibility of attaining a desired result: building a walk-path a lexeme by lexeme, one at a time, provides an immediate visual feedback and let coming up with the desired result rather quickly.

learning curve:

handling irregular JSONs:

solutions input invariance

- most of jtc solutions would be input invariant (hardly the same could be stated for jq). Not that it's impossible to come up with invariant solutions in jq, it's just a lot more harder, while jtc with its walk-path model prompts for invariant solutions. I.e., the invariant solution will keep working even once the JSON outer format changes (the invariant solution only would stop working once the relationship between walked JSON elements changes).
E.g.: consider a following query, extract format [ "name", "surname" ] from 2 types of JSON:

bash $ case1='{"Name":"Patrick", "Surname":"Lynch", "gender":"male", "age":29}'
bash $ case2='[{"Name":"Patrick", "Surname":"Lynch", "gender":"male", "age":29},{"Name":"Alice", "Surname":"Price", "gender":"female", "age":27}]'

a natural, idiomatic jtc solution would be:

bash $ <<<$case1 jtc -w'<Name>l:[-1]' -rT'[{{$a}},{{$b}}]'
[ "Patrick", "Lynch" ]
bash $ <<<$case2 jtc -w'<Name>l:[-1]' -rT'[{{$a}},{{$b}}]'
[ "Patrick", "Lynch" ]
[ "Alice", "Price" ]

While one of the most probable jq solution would be:

bash $ <<<$case1 jq -c 'if type == "array" then .[] else . end | [.Name, .Surname]'
["Patrick","Lynch"]
bash $ <<<$case2 jq -c 'if type == "array" then .[] else . end | [.Name, .Surname]'
["Patrick","Lynch"]
["Alice","Price"]

The both solutions work correctly, however, any change in the outer encapsulation will break jq's solution , while jtc will keep working even if JSON is reshaped into an irregular structure, e.g.:

#jtc:
bash $ case3='{"root":[{"Name":"Patrick", "Surname":"Lynch", "gender":"male", "age":29}, {"closed circle":[{"Name":"Alice", "Surname":"Price", "gender":"female", "age":27}, {"Name":"Rebecca", "Surname":"Hernandez", "gender":"female", "age":28}]}]}'
bash $ 
bash $ <<<$case3 jtc -w'<Name>l:[-1]' -rT'[{{$a}},{{$b}}]'
[ "Patrick", "Lynch" ]
[ "Alice", "Price" ]
[ "Rebecca", "Hernandez" ]

#jq:
bash $ <<<$case3 jq -c 'if type == "array" then .[] else . end | [.Name, .Surname]'
[null,null]

The same property makes jtc solutions resistant to cases of incomplete data, e.g.: if we drop "Name" entry from one of the entries in case 2, jtc solution still works correctly:

#jtc:
bash $ case2='[{"Surname":"Lynch", "gender":"male", "age":29},{"Name":"Alice", "Surname":"Price", "gender":"female", "age":27}]'
bash $ 
bash $ <<<$case2 jtc -w'<Name>l:[-1]' -rT'[{{$a}},{{$b}}]'
[ "Alice", "Price" ]

#jq:
bash $ <<<$case2 jq -c 'if type == "array" then .[] else . end | [.Name, .Surname]'
[null,"Lynch"]
["Alice","Price"]

- i.e., jtc will not assume that user would require some default substitution in case of incomplete data (but if such handling is required then the walk-path can be easily enhanced)

programming model

JSON numerical fidelity:

Handlingjtcjq 1.6
Invalid Json: [00]<<<'[00]' jtc<<<'[00]' jq -c .
Parsing resultjtc json parsing exception (<stdin>:3): missed_prior_enumeration[0]
Precision test:<<<'[0.99999999999999999]' jtc -r<<<'[0.99999999999999999]' jq -c .
Parsing result[ 0.99999999999999999 ][1]
Retaining original format:<<<'[0.00001]' jtc -r<<<'[0.00001]' jq -c .
Parsing result[ 0.00001 ][1e-05]
Stream of atomic JSONs:<<<'{}[]"bar""foo"00123truefalsenull' jtc -Jr<<<'{}[]"bar""foo"00123truefalsenull' jq -sc
Parsing result[ {}, [], "bar", "foo", 0, 0, 123, true, false, null ]parse error: Invalid numeric literal at line 2, column 0

performance:

Comparison of single-threaded performance:
Here's a 4+ million node JSON file standard.json:

bash $ time jtc -zz standard.json 
4329975
user 6.085 sec

The table below compares jtc and jq performance for similar operations (using TIMEFORMAT="user %U sec"):

jtc 1.76jq 1.6
parsing JSON:parsing JSON:
bash $ time jtc -t2 standard.json | md5bash $ time jq -M . standard.json | md5
d3b56762fd3a22d664fdd2f46f029599d3b56762fd3a22d664fdd2f46f029599
user 9.110 secuser 18.853 sec
removing by key from JSON:removing by key from JSON:
bash $ time jtc -t2 -pw'<attributes>l:' standard.json | md5bash $ time jq -M 'del(..|.attributes?)' standard.json | md5
0624aec46294399bcb9544ae36a33cd50624aec46294399bcb9544ae36a33cd5
user 10.027 secuser 27.439 sec
updating JSON recursively by label:updating JSON recursively by label:
bash $ time jtc -t2 -w'<attributes>l:[-1]' -i'{"reserved": null}' standard.json | md5bash $ time jq -M 'walk(if type == "object" and has("attributes") then . + { "reserved" : null } else . end)' standard.json | md5
6c86462ae6b71e10e3ea114e86659ab56c86462ae6b71e10e3ea114e86659ab5
user 12.715 secuser 29.450 sec

Comparison of jtc to jtc (single-threaded to multi-threaded parsing performance):

bash $ unset TIMEFORMAT
bash $ 
bash $ # concurrent (multi-threaded) parsing:
bash $ time jtc -J / -zz  standard.json standard.json standard.json standard.json standard.json 
21649876

real	0m10.995s     # <- compare these figures
user	0m34.083s
sys	0m3.288s
bash $ 
bash $ # sequential (single-threaded) parsing:
bash $ time jtc -aJ / -zz  standard.json standard.json standard.json standard.json standard.json 
21649876

real	0m31.717s     # <- compare these figures
user	0m30.125s
sys	0m1.555s
bash $ 

Machine spec used for testing:

  Model Name:                 MacBook Pro
  Model Identifier:           MacBookPro15,1
  Processor Name:             Intel Core i7
  Processor Speed:            2,6 GHz
  Number of Processors:       1
  Total Number of Cores:      6
  L2 Cache (per Core):        256 KB
  L3 Cache:                   12 MB
  Hyper-Threading Technology: Enabled
  Memory:                     16 GB 2400 MHz DDR4

compare jtc based solutions with jq's:

Here are published some answers for JSON queries using jtc, you may compare those with jq's, as well as study the feasibility of the solutions, test relevant performance, etc

Refer to a complete User Guide for further examples and guidelines.