Awesome
JSON Schema Profile
The goal of JSON Schema Profile is to augment the vocabulary of JSON Schema to represent properties of the data as opposed to focusing only on the structure.
Definitions
Bloom filter
This is a string which represents a serialized Bloom filter. Currently this is a Base64 encoded serialized value of the specific Bloom filter class used by JSONoid, but we plan to make this a more reusable format.
Bloom filters are useful to check if specific values were observed for a particular property without the need to store all the values.
Histogram
property | description |
---|
bins | An array of two-element arrays where the first element is the mean of the bin and the second is the number of elements in the bin |
hasExtremeValues | A Boolean indicating whether the histogram contains values which cannot be represented in the given bounds. This usually only occurs for extremely large absolute values and is rarely observed in practice |
Statistics
property | description |
---|
variance | The variance of all values of this property |
stdev | The standard deviation of all values of this property |
skewness | The skewness of all values of this property |
kurtosis | The kurtosis of all values of this property |
Arrays
property | description |
---|
lengthHistogram | A histogram of array lengths |
Booleans
property | description |
---|
pctTrue | Percentage of the Boolean values which are true |
Integers
property | description |
---|
bloomFilter | A Bloom filter of integer values |
distinctValues | An estimate of the number of distinct values (cardinality) of this property |
histogram | A histogram of integer values |
statistics | A set of statistics of integer values |
Numbers
property | description |
---|
bloomFilter | A Bloom filter of number values |
distinctValues | An estimate of the number of distinct values (cardinality) of this property |
histogram | A histogram of number values |
statistics | A set of statistics of number values |
Objects
property | description |
---|
fieldPresence | An object where the value represents the percentage of the time the corresponding key appears |
Strings
property | description |
---|
bloomFilter | A Bloom filter of string values |
distinctValues | An estimate of the number of distinct values (cardinality) of this property |
lengthHistogram | A histogram of string lengths |