Home

Awesome

rubbercube

Library for OLAP atop ElasticSearch. Provides efficient aggreagation and analysis over huge fact tables at a real-time. Generic high-level abstractions allow to replace ElasticSearch with some other database, or use both side-by-side, being encapsulated with user-friendly API.

The library is used in production for a quite long time, now, and rather stable.

SBT depdendency:

resolvers += "rubbercube" at "https://bokland.github.io/rubbercube"

libraryDependencies += "com.bokland" %% "rubbercube" % "0.2-SNAPSHOT"

API

All queries are passed as JSON URL-encoded strings to q GET-parameter. The description below describes format for different query types by giving examples of JSON documents with several placeholders:

There are several special cases, edge cases etc, they are described in plain text in Javascript-style comments (e.g. “// this field makes sense for some types only”).

All the rest text in descriptions should be passed as is.

Example (it's not a real query, I made it up for illustration only!):

{
    "type": "sliceAndDice" | "leftJoin",
    "cubeId": {{cubeId}},
    "queries": [Query, ...], // only for "type": "leftJoin"
    "alias": {{alias}}?,
    "interval": {{interval:number}}
}

means:

Classes description

SliceAndDice Query

{
    "type": "sliceAndDice",
    "cube": {{cubeId}},
    "aggregations": [Aggregation, ...],
    "measures": [Measure, ...],
    "filters": [Filter, ...],
    "parent_id" {{parentId}}?,
    "from": {{from:int}}?,
    "size": {{size:int}}?,
    "include_fields": [{{fieldName}}, ...]?,
    "exclude_fields": [{{fieldName}}, ...]?,
    "sort": [
        [{{fieldName}}, "asc" | "desc"], ...
    ]?
}

LeftJoin Query

{
    "type": "leftJoin",
    "queries": [Query, ...],
    "by": [Dimension, ...],
    "measures": [Measure, ...]
}

Aggregation

{
    "dimension": Dimension,
    "aggregation": AggregationType
}

Dimension

{
    "field": {{fieldName}},
    "cubeId": {{cubeId}}?,
    "alias": {{alias}}?
}

AggregationType

{
    "type": "number" | "date" | "category" | "missing",
    "date_type": "Day" | "Week" | "Month" | "Quarter" | "Year", // for "type": "date"
    "interval": {{interval:number}} // for "type": "number"
}

Measure

There're three types of measure:

{
    "type": "reference",
    "alias": {{alias}}
}
{
    "type": "dimension",
    "alias": {{alias}}?,
    "operation": "countdistinct" | "count" | "sum" | "avg" | "max" | "min" | "categories"
}
{
    "type": "derived",
    "alias": {{alias}}?,
    "operation": "div",
    "dim1": Measure,
    "dim2": Measure
}

Filter

Filters are the special case, as their structure highly depends on filter type, though they can be split up into several groups by their kind.

Single dimension filters

{
    "operation": "eql" | "neql" | "gt" | "gte" | "lt" | "lte", // "dimension" equals (==), not equals (!=), greater than (>), greater or equal than (>=), less than (<), less or equal than (<=) "value"
    "dimension": Dimension,
    "value": {{value}}
}
{
    "operation": "in", // "dimension" is equal to one of "value"s in the array
    "dimension": Dimension,
    "value": [{{value}}, ...]
}

Multi-dimensional filters

{
    "operation": "and" | "or", // all (logical AND) / any (logic OR) "filters" are true
    "filters" [Filter, ...]
}

Zero dimension filters

{
    "operation": "missing" | "exists", // "dimension" is missing / exists in a document
    "dimension": Dimension
}

###Contributors

Oleg Oleshko / https://github.com/OlegYch

Siarzhuk Miadzvedzeu / https://github.com/siarzh

Konstantin Stepanov / https://github.com/kstep