Awesome
<a href="https://www.buymeacoffee.com/ssilverman" title="Donate to this project using Buy Me A Coffee"><img src="https://img.shields.io/badge/buy%20me%20a%20coffee-donate-orange.svg" alt="Buy Me A Coffee donate button"></a>
Snow, a JSON Schema Validator
Version: 0.16.0
The main goal of this project is to be a reference JSON Schema validator. While it provides a few working applications, it's meant primarily as an API for building your own toolset.
See: JSON Schema
Table of contents
- Features
- Quick start
- Under the covers
- Limitations
- Which specification is used for processing?
- Options for controlling behaviour
- Project structure
- Building and running
- The linter
- The coverage checker
- Future plans
- References
- An ending thought
- Acknowledgements
- License
Features
This project has the following features:
- Full support for all drafts since Draft-06.
- Full "format" validation support, with a few platform-specific exceptions.
- Full annotation and error support.
- There is enough information to provide full output support. The calling app has to sort through and format what it wants, however.
- Written for correctness. This aims to be a reference implementation.
- Can initialize with known URIs. These can be any of:
- Parsed JSON objects.
- URLs. The URLs can be anything, including filesystem locations, web resources, and other objects. "URL" here means that the system knows how to retrieve something vs. "URI", which is just an ID.
- Options for controlling "format" validation, annotation collection, error collection, and default and non-default specification choice.
- There's rudimentary infinite loop detection, but only if error or annotation collection is enabled. It works by detecting that a previous keyword has been applied to the same instance location.
- Specification detection heuristics for when there's no $schema declared.
- Content validation support for the "base64" encoding and "application/json" media type.
Additional features
These additional features exist:
- A rudimentary linter that catches simple but common errors.
- A coverage tool.
Quick start
There are more details below, but here are four commands that will get you started right away:
- Run the validator on an instance against a schema:
The two files in this example are namedmvn compile exec:java@main -Dexec.args="schema.json instance.json"
schema.json
for the schema andinstance.json
for the instance. The example assumes the files are in the current working directory. - Clone and then run the test suite:
This assumes that the test suite is inmvn compile exec:java@test -Dexec.args="/suites/json-schema-test-suite"
/suites/json-schema-test-suite
. Yours may be in a different location. The test suite can be cloned from JSON Schema Test Suite. - Run the linter on a schema:
The schema file in this example is namedmvn compile exec:java@linter -Dexec.args="schema.json"
schema.json
. The example assumes the file is in the current working directory. - Run the schema coverage checker after a validation:
The two files in this example are namedmvn compile exec:java@coverage -Dexec.args="schema.json instance.json"
schema.json
for the schema andinstance.json
for the instance. The example assumes the files are in the current working directory.
Under the covers
This project uses Google's Gson library under the hood for JSON parsing. ClassGraph is used to support class finding.
This means these things:
- The external API for this project uses Gson's JSON object model.
Limitations
This project follows just about everything it can from the latest JSON Schema specification draft. There are a few things it does slightly differently due to some implementation details.
- Regular expressions allow or disallow some things that
ECMA 262
regular expressions do not. For example, Java allows the
\Z
boundary matcher but ECMA 262 does not.
Which specification is used for processing?
There are a few ways the validator determines which specification to use when processing and validating a schema. The steps are as follows:
- $schema value. If the schema explicitly specifies this value, and if it is known by the validator, then this is the specification that the validator will use.
- The
SPECIFICATION
option or any default. - Guessed by heuristics.
- The
DEFAULT_SPECIFICATION
option or any default. - Not known.
Options for controlling behaviour
This section describes options that control the validator behaviour.
All options are defined in the com.qindesign.json.schema.Option
class, and
their use is in com.qindesign.json.schema.Options
.
Some options are specification-specific, meaning they have different defaults depending on which specification is applied. Everything else works as expected: users set or remove options. It is only the internal defaults that have any specification-specific meanings.
There are two ways to retrieve an option. Both are similar, except one of the ways checks the specification-specific defaults before the non-specification-specific defaults. The steps are as follows, where subsequent steps are followed only if the current step is not successful.
Specification-specific consultation steps, using a specific specification:
- Options set by the user.
- Specification-specific defaults.
- Non-specification-specific defaults.
- Not found.
Non-specification-specific consultation steps:
- Options set by the user.
- Non-specification-specific defaults.
- Not found.
Option: AUTO_RESOLVE
Type: java.lang.Boolean
This controls whether the validator should attempt auto-resolution when searching for schemas or when otherwise resolving IDs. This entails adding the original base URI and any root $id as known URLs during validation.
Option: COLLECT_ANNOTATIONS_FOR_FAILED
Type: java.lang.Boolean
This controls, if annotations are collected, whether they should also be retained for failed schemas. This option only has an effect when annotations are being collected.
Option: CONTENT
Type: java.lang.Boolean
This controls whether to treat the "content" values as assertions in Draft-07. This only includes "contentEncoding" and "contentMediaType".
Option: DEFAULT_SPECIFICATION
Type: com.qindesign.json.schema.Specification
This option specifies the default specification to follow if one cannot be determined from a schema, either by an explicit indication, or by heuristics. This is the final fallback specification.
Option: FORMAT
Type: java.lang.Boolean
This is a specification-specific option meaning its default is different depending on which specification is being used. It controls whether to treat "format" values as assertions.
Option: SPECIFICATION
Type: com.qindesign.json.schema.Specification
This indicates which specification to use if one is not explicitly stated in a schema.
Project structure
This project is designed to provide APIs and tools for performing JSON Schema validation. Its main purpose is to do most of the work, but have the user wire in everything themselves. A few rudimentary and runnable test programs are provided, however.
The main package is com.qindesign.json.schema
.
Module information
This project defines a module and exports these packages:
com.qindesign.json.schema
: This is the main validation package.com.qindesign.json.schema.net
: Provides some URI and hostname processing tools.
It also transitively requires this package:
com.google.gson
Complete programs
The first program is Main
. This takes two arguments, a schema file and an
instance file, and then performs validation of the instance against the schema.
The second program is Test
. This takes one argument, a directory containing
the JSON Schema test suite, and then runs all the tests in the suite. You can
obtain a copy of the test suite by cloning the
test suite repository.
The third program is Linter
, a rudimentary linter for JSON Schema files. It
takes one argument, the schema file to check.
The fouth program is Coverage
, a simple coverage tool for JSON Schemas and
instances. It's similar to Main
, but prints different output.
API
The main entry point to the API is the Validator
constructor and validate
method. In addition to the non-optional schema, instance, and base URI, you can
pass options, known IDs and URLs, and a place to put collected annotations and
errors.
In this version, the caller must organize the errors into the desired output
format. An example of how to convert them into the Basic output format is in
the Main.basicOutput
method.
Providing tools to format the errors into more output formats may happen in the future.
Annotations and errors
Annotations and errors are collected by optionally providing maps to
Validator.validate
. They're maps from instance locations to an associated
Annotation
object, with some intervening values.
- The annotations map follows this structure: instance location → name
→ schema location →
Annotation
. TheAnnotation
value is dependent on the source of the annotation. - The errors map has this structure: instance location → schema location
→
Annotation
. TheAnnotation
value is aValidationResult
object, and its name will be "error" when the result isfalse
and "annotation" when the result istrue
.
For annotations, Annotation.isValid()
indicates whether the annotation is
considered valid or auxiliary. When
failed annotations are collected,
invalid annotations indicate an annotation that would otherwise exist if the
associated schema had not failed.
For errors, Error.isPruned()
means that the result is not relevant to the
schema result. For example, "oneOf" will pass validation if one subschema passes
and all the other subschemas fail. All failing subschemas will indicate an
error, but it will be marked as pruned.
This is useful to track coverage vs. a minimal set of useful errors.
The Results
class provides some tools for sorting and collecting annotations
and errors. It does the work of extracting a list of useful results.
The locations are given as JSON Pointers.
The annotation types for specific keywords are as follows:
- "additionalItems":
java.lang.Boolean
, alwaystrue
if present, indicating that the subschema was applied to all remaining items in the instance array. - "additionalProperties":
java.util.Set<String>
, the set of property names whose contents were validated by this subschema. - "contentEncoding":
java.lang.String
- "contentMediaType":
java.lang.String
- "contentSchema":
com.google.gson.JsonElement
- "default":
com.google.gson.JsonElement
- "deprecated":
java.lang.Boolean
- "description":
java.lang.String
- "examples":
com.google.gson.JsonArray
- "format":
java.lang.String
- "items":
java.lang.Integer
, the largest index in the instance to which a subschema was applied, orjava.lang.Boolean
(alwaystrue
) if a subschema was applied to every index. - "patternProperties":
java.util.Set<String>
, the set of property names matched by this keyword. - "properties":
java.util.Set<String>
, the set of property names matched by this keyword. - "readOnly":
java.lang.Boolean
- "title":
java.lang.String
- "unevaluatedItems":
java.lang.Boolean
, alwaystrue
if present, indicating that the subschema was applied to all remaining items in the instance array. - "unevaluatedProperties":
java.util.Set<String>
, the set of property names whose contents were validated by this subschema. - "writeOnly":
java.lang.Boolean
Internal APIs
There are a few internal APIs that may be useful for your own projects, outside of schema validation. Note that these are subject to change.
com.qindesign.json.schema.util.Base64InputStream
: Converts a Base64-encoded string into a byte stream.com.qindesign.json.schema.util.LRUCache
: An LRU cache implementation.com.qindesign.json.schema.net.Hostname
: Parses regular and IDN hostnames.com.qindesign.json.schema.net.URI
: An RFC 3986-compliant URI parser. As of this writing, Java's URI API is only RFC 2396-compliant and is not sufficient for processing JSON Schemas.com.qindesign.json.schema.net.URIParser.parseIPv6
: Parses IPv6 addresses, per RFC 3986.com.qindesign.json.schema.net.URIParser.parseIPv4
: Parses IPv4 addresses, per RFC 3986.
Please consult the Javadocs for those classes and methods for more information.
Building and running
This project uses Maven as its build tool because it makes managing the dependencies easy. It uses standard Maven commands and phases. For example, to compile the project, use:
mvn compile
To clean and then re-compile:
mvn clean compile
Maven makes it easy to build, execute, and package everything with the right dependencies, however it's also possible to use your IDE or different tools to manage the project. This section only discusses Maven usage.
Program execution with Maven
Maven takes care of project dependencies for you so you don't have to manage the classpath or downloads.
Currently, there are four predefined execution targets:
main
: ExecutesMain
. Validates an instance against a schema.test
: ExecutesTest
. Runs the test suite.linter
: ExecutesLinter
. Checks a schema.coverage
: ExecutesCoverage
. Does a schema coverage check after validation.
This section shows some simple execution examples. There's more information about the included programs below.
Note that Maven doesn't automatically build the project when running an
execution target. It either has to be pre-built using compile
or added to the
command line. For example, to compile and then run the linter:
mvn compile exec:java@linter -Dexec.args="schema.json"
To run the main validator without attempting a compile first, say because it's already built:
mvn exec:java@main -Dexec.args="schema.json instance.json"
To compile and run the test suite and tell the test runner that the suite is
in /suites/json-schema-test-suite
:
mvn compile exec:java@test -Dexec.args="/suites/json-schema-test-suite"
To execute a specific main class, say one that isn't defined as a specific
execution, add an exec.mainClass
property. For example, if the fully-qualified
main class is my.Main
and it takes some "program arguments":
mvn exec:java -Dexec.mainClass="my.Main" -Dexec.args="program arguments"
Using Snow in your own projects
Snow is available from the Maven Central Repository. To include it in your own programs, add the following dependency:
<dependency>
<groupId>com.qindesign</groupId>
<artifactId>snowy-json</artifactId>
<version>0.15.0</version>
</dependency>
The linter
The linter's job is to provide suggestions about potential errors in a schema. It shows only potential problems whose presence does not necessarily mean the schema won't work.
The linter is rudimentary and does not check or validate everything about the schema. It does currently check for the following things:
- Unknown
format
values. - Empty
items
arrays. additionalItems
without a sibling array-formitems
.$schema
elements inside a subschema that do not have a sibling$id
.- Unknown keywords. Similar keywords are noted by doing case-insensitive matching to known keywords.
- Property names that start with "$".
- Unnormalized
$id
values. - Locally-pointing
$ref
values that don't exist. - Any "minimum" keyword that is greater than its corresponding "maximum"
keyword. For example,
minLength
andmaxLength
. exclusiveMinimum
is not strictly less thanexclusiveMaximum
.- Expected type checking for appropriate keywords. For example,
minimum
expects that the type is "number" or "integer" andformat
expects a "string" type. - Implied type checking for
default
andconst
; a type is expected to exist and to match the implied type for these values. - Non-unique
enum
s. - Empty
enum
,allOf
,anyOf
, oroneOf
. - Draft 2019-09 or later schemas having keywords that were removed in Draft 2019-09.
- Pre-Draft 2019-09 schemas having keywords that were added in Draft 2019-09.
- Pre-Draft-07 schemas having keywords that were added in Draft-07.
- Draft 2019-09 or later, or unspecified, schemas:
minContains
without a siblingcontains
.maxContains
without a siblingcontains
.unevaluatedItems
without a sibling array-formitems
.$id
values that have an empty fragment.
- Draft-07 or later, or unspecified, schemas:
then
withoutif
.else
withoutif
.
- Draft-07 or earlier schemas:
$ref
members with siblings.
Doing your own linting
It's possible to add your own rules to the linter. There are four important concepts to know about when adding rules:
- A rule may optionally be assigned to execute for a specific element type. For
example, a rule added via
Linter.addStringRule
will execute if the current element is a primitive string. - A rule learns about the current state of the JSON tree from a context object
parameter, an instance of
Linter.Context
. - Any detected issues are sent to the context.
- The rules operate in addition to the existing linter rules.
The following example snippet tests for the existence of any "anyOf" schema keywords:
JsonElement schema;
// ...load the schema...
Linter linter = new Linter();
linter.addRule(context -> {
if (context.isKeyword() && context.is("anyOf")) {
context.addIssue("anyOf detected");
}
});
Map<JSONPath, List<String>> issues = linter.check();
// ...print the issues...
Linting by traversing the tree
The JSON
class has a traverseSchema
method that does a preorder tree
traversal for JSON schemas. It's what the linter uses internally. It's also
possible to use this to write your own linting rules.
The following example snippet also tests for the existence of any "anyOf" schema keywords:
JsonElement schema;
// ...load the schema...
JSON.traverseSchema(schema, (e, parent, path, state) -> {
if (!state.isNotKeyword() && path.endsWith("anyOf")) {
System.out.println(path + ": anyOf keyword present");
}
});
The coverage checker
The coverage checker works similarly to the main validator, except that after validation, it prints out some coverage results.
It outputs two JSON objects:
- Seen and unseen schema locations organized by instance location.
- Seen schema locations only.
Future plans
There are plans to explore supporting more features, including:
- Custom vocabulary support.
- More output formatting. All the information is currently there, but the caller must process and organize it.
- Better caching. The current implementation doesn't cache things such as URLs
and regex Patterns across different instances of
ValidatorContext
, i.e. across calls toValidator.validate
. - Compilation into an internal representation that provides both speed and optimizations for non-dynamic validation paths.
- A better representation than maps for annotations and errors.
- A better way of filtering (i.e. organizing) errors and annotations for human consumption. For example, not needing to manually prune parent errors. A more fleshed-out way to identify terminal and non-terminal errors, and also which are important.
Possible future plans
These are plans that may or may not be explored:
- Linter rule IDs for selective linting.
References
- JSON Schema Specification
- Gson
- ClassGraph
- ECMA 262
- JSON Schema Test Suite
- JSON Schema Draft Sources
- JSON Pointer
- URI Syntax
- IDN Hostnames
An ending thought
I'd love to say this: "The validator isn't wrong, the spec is ambiguous."™
Realistic? No, but fun to say anyway.
Acknowledgements
Thanks to <a href="https://www.jetbrains.com/?from=Snow">JetBrains</a> for providing an Open Source licence for IntelliJ, my favourite IDE since forever.
License
Snow, a JSON Schema validator
Copyright (c) 2020-2021 Shawn Silverman
This program is free software: you can redistribute it and/or modify
it under the terms of the GNU Affero General Public License as published
by the Free Software Foundation, either version 3 of the License, or
(at your option) any later version.
This program is distributed in the hope that it will be useful,
but WITHOUT ANY WARRANTY; without even the implied warranty of
MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
GNU Affero General Public License for more details.
You should have received a copy of the GNU Affero General Public License
along with this program. If not, see <https://www.gnu.org/licenses/>.
Copyright (c) 2020-2021 Shawn Silverman