Awesome
HexTuples
Status: draft
Version: 0.3.0
HexTuples is a simple datamodel for dealing with linked data. This document both describes the model and concepts of HexTuples, as well as the (at this moment only) serialization format: HexTuples-NDJSON. It is very easy to parse, can be used for streaming parsing and is designed to be highly performant in JS contexts.
Concepts
HexTuple
A single HexTuple is an atomic piece of data, similar to an RDF Triple (also known as Statements or Quads).
A HexTuple cotains a small piece of information.
HexTuples consist of six fields: subject
, predicate
, value
, datatype
, language
and graph
.
Let's encode the following sentence in HexTuples:
Tim Berners-Lee, the director of W3C, is born in London on the 8th of June, 1955.
Subject | Predicate | Value | DataType | Language | Graph |
---|---|---|---|---|---|
Tim | birthPlace | London | |||
Tim | birthDate | 1955-06-08 | xsd:date | ||
Tim | jobTitle | Director of W3C | rdf:langString | en-US |
URI
URI stands for Uniform Resource Identifier, specified in RDF 3986. The best known type of URI is the URL. Although it is currently best practice to use mostly HTTPS URLs as URIs, HexTuples works with any type of URI.
Subject
- The subject is identifier of the thing the statement is about.
- This field is required.
- It MUST be a URI.
Predicate
- The predicate describes the abstract property of the statement.
- This field is required.
- It MUST be a URI.
Value
- The value contains the object of the HexTuple.
- This field is required.
- It can be any datatype, specified in the
datatype
of the HexTuple.
Datatype
- The datatype contains the object of the HexTuple.
- This field is optional.
- It MUST be a URI or an empty string.
- When the Datatype is a NamedNode, use:
globalId
- When the Datatype is a BlankNode, use:
localId
Language
- The datatype contains the object of the HexTuple.
- This field is optional.
- It MUST be an RFC 3066 language tag or an empty string.
Relation to RDF
The HexTuples datamodel closely resembles the RDF Data Model, which is the de-facto standard for linked data.
RDF statements are often called Triples, because they consist of a subject
, predicate
and value
.
The object
field is either a single URI (in Named Nodes), or a combination of three fields (in Literal): value
, datatype
, language
.
This means that a single Triple can actually consist of five fields: the subject
, predicate
, value
, datatype
and the language
.
A Quad statement also has a graph
, which totals to six fields, hence the name: HexTuples.
Instead of making a distinction between Literal statements and NamedNode statements (which have two different models), HexTuples uses a single model that describes both.
Having a single model for all statements (HexTuples), makes it easier to serialize, query and store data.
HexTuples-NDJSON
This document serves as a work in progress / draft specification
HexTuples-NDJSON is an NDJSON (Newline Delimited JSON) based HexTuples / RDF serialization format. It is desgined to support streaming parsing and provide great performance in a JS context (i.e. the browser).
- A valid HexTuples document MUST be serialized using NDJSON
- HexTuples-NDJSON MIME type:
application/hex+x-ndjson; charset=utf-8
- Each array MUST consist of six strings.
- Each array represents one RDF statement / quad / triple
- The six strings in each array respectively represent
subject
,predicate
,value
,datatype
,lang
andgraph
. - The
datatype
andlang
fields are only used when thevalue
represents a Literal value (i.e. not a URI, but a string / date / something else). In RDF, the combination ofvalue
,datatype
andlang
are known asobject
. - When expressing an Object that is a NamedNode, use this string as the datatype: "globalId" (discussion)
- When expressing an Object that is a BlankNode, use this string as the datatype: "localId"
- If the
graph
is a blank node (i.e. anonymous), use an underscore as the URI scheme:_:myNode
. (discussion). Parsers SHOULD interpret these as blank graphs, but MAY discard these if they have no support for them. - When a field has no value, use an empty string:
""
Example
English:
Tim Berners-Lee was born in London, on the 8th of june in 1955.
Turtle / N-Triples:
<https://www.w3.org/People/Berners-Lee/> <http://schema.org/birthDate> "1955-06-08"^^<http://www.w3.org/2001/XMLSchema#date>.
<https://www.w3.org/People/Berners-Lee/> <http://schema.org/birthPlace> <http://dbpedia.org/resource/London>.
Expresed in HexTuples:
["https://www.w3.org/People/Berners-Lee/", "http://schema.org/birthDate", "1955-06-08", "http://www.w3.org/2001/XMLSchema#date", "", ""]
["https://www.w3.org/People/Berners-Lee/", "http://schema.org/birthPlace", "http://dbpedia.org/resource/London", "globalId", "", ""]
Implementations
Ontola TypeScript HexTuples Parser
This Typescript code should give you some idea of how to write a parser for HexTuples.
const object = (value: string, datatype: string, language: string): SomeTerm => {
if (language) {
return literal(value, language);
} else if (datatype === 'globalId') {
return namedNode(value);
} else if (datatype === 'localId') {
return blankNode(value);
}
return literal(value, namedNode(datatype));
};
const lineToQuad = (h: string[]) => quad(
h[0].startsWith('_:') ? blankNode(h[0]) : namedNode(h[0]),
namedNode(h[1]),
object(h[2], h[3], h[4]),
h[5] ? namedNode(h[5]) : defaultGraph(),
);
Python RDFlib
- https://pypi.org/project/rdflib/
- RDFLib is a pure Python package for working with RDF.
- It supports parsing and serliazing RDF as HexTuples
- Internally (in Python objects), RDF parsed from HexTuples data is represented in a Conjunctive Graph, that is a multi-graph object
- HexTuples files must end in the file extension
.hext
for RDFlib to auto-recognise the format although files with any ending can be used if the format is given (format=hext
)
An RDF format conversion tool using RDFLib that can convert from/to HexTuples is online at https://tools.dev.kurrawong.ai/convert.
Motivation for HexTuples-NDJSON
HexTuples was designed by Thom van Kalkeren (CTO of Ontola) because he noticed that parsing / serialization was unnecessarily costly in our full-RDF stack, even when using the relatively performant n-quads
format.
- Since HexTuples is serialized in NDJSON, it benefits from the highly optimised JSON parsers in browsers.
- It uses NDJSON instead of regular JSON because it makes it easier to parse concatenated responses (multiple root objects in one document).
- NDJSON enables streaming parsing as well, which gives it another performance boost.
- Some JS RDF libraries (link-lib, link-redux) have an internal RDF graph model which uses these HexTuples arrays as well, which means that there is minimal mapping cost when parsing Hex-Tuple statements. This format is especially suitable for real front-end applications that use dynamic RDF data.