Home

Awesome

JSON-LD Streaming Parser

Build status Coverage Status npm version

A fast and lightweight streaming and 100% spec-compliant JSON-LD 1.1 parser, with RDFJS representations of RDF terms, quads and triples.

The streaming nature allows triples to be emitted as soon as possible, and documents larger than memory to be parsed.

Make sure to enable the streamingProfile flag when parsing a JSON-LD document with a streaming profile to exploit the streaming capabilities of this parser, as this is disabled by default.

Installation

$ npm install jsonld-streaming-parser

or

$ yarn add jsonld-streaming-parser

This package also works out-of-the-box in browsers via tools such as webpack and browserify.

Require

import {JsonLdParser} from "jsonld-streaming-parser";

or

const JsonLdParser = require("jsonld-streaming-parser").JsonLdParser;

Usage

JsonLdParser is a Node Transform stream that takes in chunks of JSON-LD data, and outputs RDFJS-compliant quads.

It can be used to pipe streams to, or you can write strings into the parser directly.

Print all parsed triples from a file to the console

const myParser = new JsonLdParser();

fs.createReadStream('myfile.jsonld')
  .pipe(myParser)
  .on('data', console.log)
  .on('error', console.error)
  .on('end', () => console.log('All triples were parsed!'));

Manually write strings to the parser

const myParser = new JsonLdParser();

myParser
  .on('data', console.log)
  .on('error', console.error)
  .on('end', () => console.log('All triples were parsed!'));

myParser.write('{');
myParser.write(`"@context": "https://schema.org/",`);
myParser.write(`"@type": "Recipe",`);
myParser.write(`"name": "Grandma's Holiday Apple Pie",`);
myParser.write(`"aggregateRating": {`);
myParser.write(`"@type": "AggregateRating",`);
myParser.write(`"ratingValue": "4"`);
myParser.write(`}}`);
myParser.end();

Convert a JSON-LD string to an RDF/JS dataset

import { Store } from 'n3';
import { JsonLdParser } from 'jsonld-streaming-parser';
import { promisifyEventEmitter } from 'event-emitter-promisify';

const store = new Store();
const parser = new JsonLdParser();
parser.write('{"@id": "http://example.org/jesse", "@type": "http://example.org/Thing"}');
parser.end();
await promisifyEventEmitter(store.import(parser));

// Logs all the quads in the store
console.log(...store);

Import streams

This parser implements the RDFJS Sink interface, which makes it possible to alternatively parse streams using the import method.

const myParser = new JsonLdParser();

const myTextStream = fs.createReadStream('myfile.jsonld');

myParser.import(myTextStream)
  .on('data', console.log)
  .on('error', console.error)
  .on('end', () => console.log('All triples were parsed!'));

Capture detected contexts

Using a context event listener, you can collect all detected contexts.

const myParser = new JsonLdParser();

const myTextStream = fs.createReadStream('myfile.jsonld');

myParser.import(myTextStream)
  .on('context', console.log)
  .on('data', console.error)
  .on('error', console.error)
  .on('end', () => console.log('All triples were parsed!'));

Parse from HTTP responses

Usually, JSON-LD is published via the application/ld+json media type. However, when a JSON-LD context is attached via a link header, then it can also be published via application/json and +json extension types.

This library exposes the JsonLdParser.fromHttpResponse function to abstract these cases, so that you can call it for any HTTP response, and it will return an appropriate parser which may or may not contain a custom header-defined context:

const myParser = JsonLdParser.fromHttpResponse(
  'http://example.org/my-file.json', // For example: response.url
  'application/json', // For example: headers.get('content-type')
  new Headers({ 'Link': '<my-context.jsonld>; rel=\"http://www.w3.org/ns/json-ld#context\"' }), // Optional: WHATWG Headers 
  {}, // Optional: Any options you want to pass to the parser
);

// Parse anything with myParser like usual
const quads = myParser.import(response.body);

The Headers object must implement the Headers interface from the WHATWG Fetch API.

This function will automatically detect the http://www.w3.org/ns/json-ld#streaming profile and set the streamingProfile flag.

Configuration

Optionally, the following parameters can be set in the JsonLdParser constructor:

new JsonLdParser({
  dataFactory: require('@rdfjs/data-model'),
  context: 'https://schema.org/',
  baseIRI: 'http://example.org/',
  streamingProfile: true,
  documentLoader: new FetchDocumentLoader(),
  ignoreMissingContextLinkHeader: false,
  produceGeneralizedRdf: false,
  processingMode: '1.0',
  errorOnInvalidIris: false,
  allowSubjectList: false,
  validateValueIndexes: false,
  defaultGraph: namedNode('http://example.org/graph'),
  rdfDirection: 'i18n-datatype',
  normalizeLanguageTags: true,
  rdfstar: true,
});

How it works

This parser does not follow the recommended procedure for transforming JSON-LD to RDF, because this does not allow stream-based handling of JSON. Instead, this tool introduces an alternative streaming algorithm that achieves spec-compliant JSON-LD parsing.

This parser builds on top of the jsonparse library, which is a sax-based streaming JSON parser. With this, several in-memory stacks are maintained. These stacks are needed to accumulate the required information to emit triples/quads. These stacks are deleted from the moment they are not needed anymore, to limit memory usage.

The algorithm makes a couple of (soft) assumptions regarding the structure of the JSON-LD document, which is true for most typical JSON-LD documents.

If these assumptions are met, (almost) each object entry corresponds to a triple/quad that can be emitted. For example, the following document allows a triple to be emitted after each object entry (except for first two lines):

{
  "@context": "http://schema.org/",
  "@id": "http://example.org/",
  "@type": "Person",               // --> <http://example.org/> a schema:Person.
  "name": "Jane Doe",              // --> <http://example.org/> schema:name "Jane Doe".
  "jobTitle": "Professor",         // --> <http://example.org/> schema:jobTitle "Professor".
  "telephone": "(425) 123-4567",   // --> <http://example.org/> schema:telephone "(425) 123-4567".
  "url": "http://www.janedoe.com"  // --> <http://example.org/> schema:url <http://www.janedoe.com>.
}

If not all of these assumptions are met, entries of an object are buffered until enough information becomes available, or if the object is closed. For example, if no @id was present, values will be buffered until an @id is read, or if the object closed.

For example:

{
  "@context": "http://schema.org/",
  "@type": "Person",
  "name": "Jane Doe",
  "jobTitle": "Professor",
  "@id": "http://example.org/",    // --> <http://example.org/> a schema:Person.
                                   // --> <http://example.org/> schema:name "Jane Doe".
                                   // --> <http://example.org/> schema:jobTitle "Professor".
  "telephone": "(425) 123-4567",   // --> <http://example.org/> schema:telephone "(425) 123-4567".
  "url": "http://www.janedoe.com"  // --> <http://example.org/> schema:url <http://www.janedoe.com>.
}

As such, JSON-LD documents that meet these requirements will be parsed very efficiently. Other documents will still be parsed correctly as well, with a slightly lower efficiency.

Streaming Profile

This parser adheres to the JSON-LD 1.1 specification, the JSON-LD 1.1 Streaming specification, and the JSON-LD star specification.

By default, this parser assumes that JSON-LD document are not in the streaming document form. This means that the parser may buffer large parts of the document before quads are produced, to make sure that the document is interpreted correctly.

Since this buffering neglects the streaming benefits of this parser, the streamingProfile flag should be enabled when a streaming JSON-LD document is being parsed.

If non-streaming JSON-LD documents are encountered when the streamingProfile flag is enabled, an error may be thrown.

Specification compliance

This parser implements the following JSON-LD specifications:

Performance

The following table shows some simple performance comparisons between JSON-LD Streaming Parser and jsonld.js.

These basic experiments show that even though streaming parsers are typically significantly slower than regular parsers, JSON-LD Streaming Parser still achieves similar performance as jsonld.js for most typical JSON-LD files. However, for expanded JSON-LD documents, JSON-LD Streaming Parser is around 3~4 times slower.

FileJSON-LD Streaming Parserjsonld.js
toRdf-manifest.jsonld (999 triples)683.964ms (38MB)708.975ms (40MB)
sparql-init.json (69 triples)931.698ms (40MB)1088.607ms (47MB)
person.json (5 triples)309.419ms (30MB)313.138ms (41MB)
dbpedia-10000-expanded.json (10,000 triples)785.557ms (70MB)202.363ms (62MB)

Tested files:

Code for measurements

License

This software is written by Ruben Taelman.

This code is released under the MIT license.