Home

Awesome

Build Status Coverity codecov

carmen-cache

carmen-cache is a low-level storage layer used in the carmen geocoder.

Install

To install carmen-cache run:

yarn install

By default, binaries are provided for 64 bit OS X >= 10.8 and 64 bit Linux (>= Ubuntu Trusty). On those platforms no external dependencies are needed.

Other platforms will fall back to a source compile: see [Source Build](#Source Build) for details

Source Build

To build from source run:

make
yarn test

This will automatically:

To do a full rebuild run: make clean

Publishing

See CONTRIBUTING for how to release a new carmen-cache version.

What carmen-cache does

When doing a forward (text -> coordinates) geocode, carmen goes through a few steps. The carmen README has a full rundown, but abbreviated for purposes of this module, carmen does approximately as follows:

The steps in bold are implemented in this module in C++ for disk/memory compactness and speed; the rest are implemented in carmen itself in Javascript, or in other carmen dependencies.

As a concrete example, if a user were searching for "paris france," carmen might determine that the country index contained the string "france" and the city index contained multiple occurrences of "paris" (one for the real one in France, one for the one in Texas, etc.). carmen-cache would be responsible for retrieving the tile coordinates of all the tiles covering the country France, and the tile coordinates covering each Paris, and seeing which aligned with which; it would discover that the French Paris's tiles overlapped with some of the tiles in the France feature, making that combination a plausible stacking, while the Texas Paris's tiles would not align with France. It would return these results to carmen for further verification.

Detailed architecture

carmen-cache exposes two implementations of the same interface, one read-write version called MemoryCache and one disk-based read-only version called RocksDBCache build on Facebook's RocksDB. The read-write version is used during carmen's index-building process, at the end of which it's serialized into the read-only version for storage. At query time, the read-only version is used instead, as it's both faster and more memory-efficient.

Carmen-cache knows about the following kinds of data:

The MemoryCache supports setting of a key (and optional list of language numbers) to a list of grid numbers. It also supports a pack operation, which writes out the read-write form into a henceforth-read-only version encoded on disk as a RocksDB database.

Both versions support a list operation to retrieve all keys, a get operation to retrieve a grid list for a given key and language set, and a getMatching operation that can do either or both of:

RocksDBCache format

The RocksDB representation of the cache condenses the data for on-disk storage as a RocksDB database. It is a key-value store:

The RocksDB representation contains an additional optimization to assist in autocomplete queries: it precomputes combined sorted lists of grids automatically for fixed-length prefixes of length 3 and length 6, so as to reduce the number of seeks and reads necessary to calculate autocomplete results for very short autocomplete queries. These precomputed versions are stored with a key that begins with =1 or =2 (for shorter and longer prefixes, respectively), followed by the prefix string, followed by the | delimiter and language bitmask as per usual. Prefixes include language annotations and are thus per-language-set just like other keys. This process is transparent to carmen: these keys are calculated and populated automatically at pack time, read automatically instead of reading the full grid lists at getMatching time if the requested key is sufficiently short, and hidden from, e.g., carmen's list operation.

Coalesce (incomplete)

carmen-cache's coalesce operation is what computes the possible stacking of combinations of substrings and returns the results to carmen. It can take advantage of the C++ threadpool to consider multiple possible stackings in parallel, and contains two implementations: coalesceSingle and coalesceMulti. The former handles cases where a given query could be satisfied in its entirety by a single index, whereas the latter considers multi-index interactions. coalesce expects a set of phrasematch objects (see carmen's source for what they contain), and returns a set of coalesce results via callback to carmen.

A brief diagrammatic overview of how coalesceMulti works follows:

coalescemulti