Home

Awesome

Medeia Validator

Medeia validator is a streaming validator for json data using schema documents specified in the Json Schema format.

License

This software is licensed under the Apache License, Version 2.0.

Software is copyright © 2018-2019 by the authors.

Maven dependency

For the Jackson support

<dependency>
    <groupId>com.worldturner.medeia</groupId>
    <artifactId>medeia-validator-jackson</artifactId>
    <version>1.1.0</version>
</dependency>

For the Gson support

<dependency>
    <groupId>com.worldturner.medeia</groupId>
    <artifactId>medeia-validator-gson</artifactId>
    <version>1.1.0</version>
</dependency>

Json Schema version support

Medeia supports the following versions of the Json Schema specification:

Medeia can validate Json data and can convert schema documents from draft-04 to draft-07.

Parser library support

Medeia works with the following Json parser libraries:

Use cases

Medeia validator was written with the following use-cases in mind:

  1. Validate Json data as it is being read into a tree or into an object model using an object mapper
  2. Validate Json data as it is being serialized from a tree or object model
  3. Validate Json data in a message router or validator component on the network, which has no need to load the Json data into a tree or object model

Streaming validation

Medeia does not build a full internal tree of the Json data while it is validating; it only temporarily stores as much as is needed for processing the current validation rules.

This ensures a lower memory footprint and tha ability to parse very large documents, even documents that do not fit into memory. This is beneficial for the use cases for which Medeia was written.

Although memory versus speed (or CPU utilization) is often a trade-off, Medeia validator is very fast for its use-cases. Validation approaches that requires an in-memory model of the data first spent time building that model, and also spend extra time garbage collecting that model afterwards.

Caveats

Kotlin, Java and JVM languages support

Supports calling from Kotlin and Java, and other languages that support the JVM that can call Java APIs.

All accessible types are in the com.worldturner.medeia.api.* packages; classes in other packages are not guaranteed to remain stable across versions, they can change at any time witout notice.

Versioning

The versioning scheme of this library is Semantic Versioning but only the public API. Public API classes have package names that starts with com.worldturner.medeia.api. The APIs of types in other packages can change at any time even between minor versions.

Source Json format support

How to use

Examples are provided in this git repository in the projects:

It includes examples on how to read and write using Jackson and Gson while also loading or retrieving from Java/Kotlin objects.

The CustomFormatExample also doubles as a way to show how streaming validation can be done without loading the data into memory at all.

The allows medeia-validator to validate many Gigabytes of data.

The MedeiaJacksonApi and MedeiaGsonApi classes have various methods to load schemas and to create validating parsers/generators (or readers/writers in Gson parlance)

The interface SchemaSource has several implementations to load schemas from InputStreams, Readers, Paths, or memory.

The version of a schema is automatically detected, but if the schema file doesn't specify it using a $schema field, the version can be provided through the SchemaSource.

Mixing different versions of schemas (draft4, 6 and 7) is allowed and schemas can refer to remote schemas in different versions than their own.

Options are passed using a ValidationOptions object.

Care has been taken that all methods in the API can be invoked from Java. The JsonSchemaValidationOptions has with* methods to allow option setting from Java.

Cloning and building medeia-validator

Medeia-validator pulls in the JSON-Schema-Test-Suite as a git submodule. When you have already cloned medeia-validator, perform this command:

git submodule update --init --recursive

Or perform the initial clone with submodules:

git clone --recurse-submodules git@github.com:worldturner/medeia-validator.git

Building is done with maven using mvn clean install and also executes git to retrieve the submodule.

Test Suite Support

Medeia validator passes all 424 'required' tests of the JSON-Schema-Test-Suite testsuite. It passes 138 out of the 143 optional tests. The 5 failing optional tests concern "format" keyword validation where the following formats that are not yet (fully) supported:

uri-template, iri, iri-reference, email, idn-email, regex

Format keyword validation is optional (and can be turned off as mandated by the specification)

The following formats are supported and pass the 'optional' testsuite:

json-pointer, relative-json-pointer, date, time, date-time, ipv4, ipv6, hostname, idn-hostname, uri, uri-reference

Performance

Performance tests include the time to parse the data from a file and to validate it. They do not include the time to load/build the validation schema itself.

Performed on mid-2015 MacBookPro, median values of at least 30 runs, see medeia-validator-performance.

Draft04

Validating the JSON schema draft 4 meta schema against itself using:

Results in milliseconds per validation (fastest first):

MedeiaJacksonMedeiaGsonEveritJsonValidator
0.10620.13000.19820.8526

Performance Chart draft-04

Draft07

Validating the JSON schema version 4 meta schema against itself using:

Results in milliseconds per validation (fastest first):

MedeiaJacksonMedeiaGsonJustifyEverit
0.07230.07420.08500.4836

Performance Chart draft-07

Large file validation

Generated lists of draft-04 and draft-07 schemas, validating against a list of the metaschema, have been tested up to 10Gb files. Time taken scales linearly - the time taken is the number of concatenations of the schema times the time taken per schema instance above (0.10-0.13 milliseconds)