Home

Awesome

Frame of Reference (FOR) C++ library

What is this?

C++ library to pack and unpack vectors of integers having a small range of values using a technique called Frame of Reference (Goldstein et al. 1998). It should run fast even though it is written in simple C++.

Code from this library is part Apache Arrow and Apache Impala.

Code usage :

Given an array of 32-bit integers, you can compress it as follows:

#include "compression.h"

...

uint32_t * inputdata = ... // length values
uint32_t * compresseddata = ... // enough data
uint32_t *out = compress(inputdata, length, compresseddata);
// compressed data lies between compresseddata and out
uint32_t nvalue = 0;
uint32_t * recoverydata = ... // available buffer with at least length elements
uncompress(compresseddata, recoverydata, nvalue);
// nvalue will be equal to length

There is a similar API with turbocompress and turbouncompress with the difference that compresseddata uses an uint8_t pointer type.

#include "turbocompression.h"

...

uint32_t * inputdata = ... // length values
uint8_t * compresseddata = ... // enough data
uint8_t *out = turbocompress(inputdata, length, compresseddata);
// compressed data lies between compresseddata and out
uint32_t nvalue = 0;
uint32_t * recoverydata = ... // available buffer with at least length elements
turbouncompress(compresseddata, recoverydata, nvalue);
// nvalue will be equal to length

We can also compress 64-bit arrays:

#include "turbocompression.h"

...

uint64_t * inputdata = ... // length values
uint8_t * compresseddata = ... // enough data
uint8_t *out = turbocompress64(inputdata, length, compresseddata);
// compressed data lies between compresseddata and out
uint32_t nvalue = 0;
uint64_t * recoverydata = ... // available buffer with at least length elements
turbouncompress64(compresseddata, recoverydata, nvalue);
// nvalue will be equal to length

Usage (with Makefile)

To run a simple benchmark, do

 make
 ./test sampledata.txt

where sampledata.txt is a text data file with one integer per line.

For a parallelized version, type

 make testmp
 ./testmp sampledata.txt

This requires OpenMP support however.

Building (with CMake under macOS and Linux)

You need to have cmake installed and available as a command.

 mkdir release
 cd release
 cmake ..
 make
 make test

Building (Visual Studio under Windows)

We are assuming that you have a common Windows PC with at least Visual Studio 2015, and an x64 processor.

To build with at least Visual Studio 2015 from the command line:

To build with at least Visual Studio 2017 directly in the IDE:

Requirements:

This was tested with GNU G++ and clang++ After suitable adjustments, it should build under most C++ compilers.

Other relevant libraries

References