Home

Awesome

JSON Schema Profile

The goal of JSON Schema Profile is to augment the vocabulary of JSON Schema to represent properties of the data as opposed to focusing only on the structure.

Definitions

Bloom filter

This is a string which represents a serialized Bloom filter. Currently this is a Base64 encoded serialized value of the specific Bloom filter class used by JSONoid, but we plan to make this a more reusable format.

Bloom filters are useful to check if specific values were observed for a particular property without the need to store all the values.

Histogram

propertydescription
binsAn array of two-element arrays where the first element is the mean of the bin and the second is the number of elements in the bin
hasExtremeValuesA Boolean indicating whether the histogram contains values which cannot be represented in the given bounds. This usually only occurs for extremely large absolute values and is rarely observed in practice

Statistics

propertydescription
varianceThe variance of all values of this property
stdevThe standard deviation of all values of this property
skewnessThe skewness of all values of this property
kurtosisThe kurtosis of all values of this property

Arrays

propertydescription
lengthHistogramA histogram of array lengths

Booleans

propertydescription
pctTruePercentage of the Boolean values which are true

Integers

propertydescription
bloomFilterA Bloom filter of integer values
distinctValuesAn estimate of the number of distinct values (cardinality) of this property
histogramA histogram of integer values
statisticsA set of statistics of integer values

Numbers

propertydescription
bloomFilterA Bloom filter of number values
distinctValuesAn estimate of the number of distinct values (cardinality) of this property
histogramA histogram of number values
statisticsA set of statistics of number values

Objects

propertydescription
fieldPresenceAn object where the value represents the percentage of the time the corresponding key appears

Strings

propertydescription
bloomFilterA Bloom filter of string values
distinctValuesAn estimate of the number of distinct values (cardinality) of this property
lengthHistogramA histogram of string lengths