Home

Awesome

µAnalytics

A micro multi-website analytics database service designed to be fast and robust, built with Go and SQLite.

Schema

Principle

Analytics databases tend to grow fast and exponentially. Requesting data for one specific website from a single database thus become very slow over time. But analytics data are highly decoupled between two websites.

The idea behind µAnalytics is to shard your analytics data on a key, which is usually a website name. Each shard thus only contains a specific website data, allowing faster response times and easy horizontal scaling.

To handle requests even faster, µAnalytics automatically manages a pool of connections to multiple shards at a time.

By default, the service keeps 10 connections alive. But you can easily increase/decrease the max number of alive shards with the --connections flag when launching the app.

Install

$ go install github.com/GitbookIO/micro-analytics

Service launching

To launch the application, simply run:

$ ./micro-analytics

The command takes the following optional parameters:

ParameterEnvironment VariableUsageTypeDefault Value
--user, -uMA_USER Username for basic authString""
--password, -wMA_PASSWORD Password for basic authString""
--port, -pMA_PORT Port to listen onString"7070"
--root, -rMA_ROOTDatabase directoryString"./dbs"
--connections, -cMA_POOL_SIZEMax number of alive shards connectionsNumber1000
--idle-timeout, -iMA_POOL_TIMEOUTIdle timeout for DB connections in secondsNumber60
--cache-directory, -dMA_CACHE_DIRCache directoryString".diskache"

If --user is provided, the service will automatically use basic access authentication on all requests.

The actual cache directory will be a subdirectory named after the app major version. The default will then be ./.diskache/0.

Analytics schema

All shards of the µAnalytics database share the same TABLE schema:

CREATE TABLE visits (
    time            INTEGER,
    event           TEXT,
    path            TEXT,
    ip              TEXT,
    platform        TEXT,
    refererDomain   TEXT,
    countryCode     TEXT
)

Service requests

GET requests

Common Parameters

Every query for a specific website can be executed using a time range. Every following GET request thus takes the two following optional query string parameters:

NameTypeDescriptionDefaultExample
startDateStart date to query a rangenone"2015-11-20T12:00:00.000Z"
endDateEnd date to query a rangenone"2015-11-21T12:00:00.000Z"

The dates can be passed either as:

Common Aggregation Parameters
NameTypeDescriptionDefaultExample
uniqueBooleanInclude the total number of unique visitors in responsenonetrue
Common Aggregation Response Values

Except for GET /:website, every response to a GET request will contain the two following values:

NameTypeDescription
totalIntegerTotal number of visits
uniqueIntegerTotal number of unique visitors based on ip, set to 0 unless unique=true is passed as a query string parameter

GET /:website

Returns the full analytics for a website.

Response
{
    "list": [
        {
            "time": "2015-11-25T16:00:00+01:00",
            "event": "download",
            "path": "/somewhere",
            "ip": "127.0.0.1",
            "platform": "Windows",
            "refererDomain": "gitbook.com",
            "countryCode": "fr"
        },
    ...
    ]
}

GET /:website/count

Returns the count of analytics for a website. The unique query string parameter is not necessary for this request.

Response
{
    "total": 1000,
    "unique": 900
}

GET /:website/countries

Returns the number of visits per countryCode.

Response

label contains the country full name.

{
    "list": [
        {
            "id": "fr",
            "label": "France",
            "total": 1000,
            "unique": 900
        },
        ...
    ]
}

GET /:website/platforms

Returns the number of visits per platform.

Response
{
    "list": [
        {
            "id": "Linux",
            "label": "Linux",
            "total": 1000,
            "unique": 900
        },
        ...
    ]
}

GET /:website/domains

Returns the number of visits per refererDomain.

Response
{
    "list": [
        {
            "id": "gitbook.com",
            "label": "gitbook.com",
            "total": 1000,
            "unique": 900
        },
        ...
    ]
}

GET /:website/events

Returns the number of visits per event.

Response
{
    "list": [
        {
            "id": "download",
            "label": "download",
            "total": 1000,
            "unique": 900
        },
        ...
    ]
}

GET /:website/time

Returns the number of visits as a time serie. The interval in seconds can be specified as an optional query string parameter. Its default value is 86400, equivalent to one day.

Parameters
NameTypeDescriptionDefaultExample
intervalIntegerInterval of the time serie86400 (1 day)3600
Response

Example with interval set to 3600:

{
    "list": [
        {
            "start": "2015-11-24T12:00:00.000Z",
            "end": "2015-11-24T13:00:00.000Z",
            "total": 450,
            "unique": 390
        },
        {
            "start": "2015-11-24T13:00:00.000Z",
            "end": "2015-11-24T14:00:00.000Z",
            "total": 550,
            "unique": 510
        },
        ...
    ]
}

POST requests

POST /:website

Insert new data for the specified website.

POST Body
{
    "time": "2015-11-24T13:00:00.000Z", // optional
    "event": "download",
    "ip": "127.0.0.1",
    "path": "/README.md",
    "headers": {
        // ...
        // HTTP headers received from your visitor
    }
}

The time parameter is optional and is set to the date of your POST request by default.

Passing the HTTP headers in the POST body allows the service to extract the refererDomain and platform values. The countryCode will be deduced from the passed ip parameter using Maxmind's GeoLite2 database.

POST /:website/bulk

Insert a list of analytics for a specific website. The analytics can be sent directly in DB format, with time being a String value.

time can be passed as either:

If the time parameter is not provided, it will be defaulted to the exact time of the server processing the POST request.

As for the POST /:website method, the analytics can also have an optional headers parameter. If the refererDomain and/or platform values are not passed in the JSON body, the headers parameter will be used to set these values automatically.

POST Body
{
    "list": [
        {
            "time": "1450098642",
            "ip": "127.0.0.1",
            "event": "download",
            "path": "/somewhere",
            "platform": "Apple Mac",
            "refererDomain": "www.gitbook.com",
            "countryCode": "fr"
        },
        {
            "time": "2015-11-20T12:00:00.000Z",
            "ip": "127.0.0.1",
            "event": "login",
            "path": "/someplace",
            "headers": {
                // ...
                // HTTP headers received from your visitor
            }
        }
    ]
}

The countryCode will be reprocessed by the service using GeoLite2 based on the ip.

POST /bulk

Insert a list of analytics for different websites. The analytics have the same format as POST /:website/bulk, with a mandatory website parameter.

POST Body
{
    "list": [
        {
            "website": "website-1",
            "time": "1450098642",
            "ip": "127.0.0.1",
            "event": "download",
            "path": "/somewhere",
            "platform": "Apple Mac",
            "refererDomain": "www.gitbook.com",
            "countryCode": "fr"
        },
        {
            "website": "website-2",
            "time": "2015-11-20T12:00:00.000Z",
            "ip": "127.0.0.1",
            "event": "login",
            "path": "/someplace",
            "headers": {
                // ...
            }
        }
    ]
}

DELETE requests

DELETE /:website

Fully delete a shard from the file system.