Home

Awesome

TurboPFor: Fastest Integer Compression</br>

Build ubuntu

Promo video

Integer Compression Benchmark (single thread):

- Synthetic data:
C Sizeratio%Bits/IntegerC MB/sD MB/sName 2019.11
62,939,88615.75.04236910950TurboPFor256
63,392,75915.85.0713597803TurboPFor128
63,392,80115.85.071328924TurboPForDA
65,060,50416.35.20602748FP_SIMDOptPFor
65,359,91616.35.23322436PC_OptPFD
73,477,08818.45.884082484PC_Simple16
73,481,09618.45.886248748FP_SimdFastPFor 64Ki *
76,345,13619.16.1110722878VSimple
91,947,53323.07.3628411737QMX 64k *
93,285,86423.37.46156810232FP_GroupSimple 64Ki *
95,915,09624.07.678483832Simple-8b
99,910,93025.07.991729812408TurboByte+TurboPack
99,910,93025.07.991735712363TurboPackV sse
99,910,93025.07.991169410138TurboPack scalar
99,910,93025.07.9984208876TurboFor
100,332,92925.18.031707711170TurboPack256V avx2
101,015,65025.38.081119110333TurboVByte
102,074,66325.58.1766899524MaskedVByte
102,074,66325.58.1722604208PC_Vbyte
102,083,03625.58.1752004268FP_VByte
112,500,00028.19.00152812140VarintG8IU
125,000,00031.210.001303912366TurboByte
125,000,00031.210.001119711984StreamVbyte 2019
400,000,000100.0032.0089608948Copy
N/AN/AEliasFano

(*) codecs inefficient for small block sizes are tested with 64Ki integers/block.


- Data files:

Speed/Ratio

SizeRatio %Bits/IntegerC Time MB/sD Time MB/sFunction 2019.11
3,321,663,89313.94.4413206088TurboPFor
3,339,730,55714.04.47322144PC.OptPFD
3,350,717,95914.04.4815367128TurboPFor256
3,501,671,31414.64.68562840VSimple
3,768,146,46715.85.0432283652EliasFanoV
3,822,161,88516.05.115722444PC_Simple16
4,411,714,93618.45.90930410444TurboByte+TurboPack
4,521,326,51818.96.058363296Simple-8b
4,649,671,42719.46.2230843848TurboVbyte
4,955,740,04520.76.63706410268TurboPackV
4,955,740,04520.76.6357248020TurboPack
5,205,324,76021.86.9669529488SC_SIMDPack128
5,393,769,50322.57.211446611902TurboPackV256
6,221,886,39026.08.3266686952TurboFor
6,221,886,39026.08.3266442260TurboForDA
6,699,519,00028.08.9618881980FP_Vbyte
6,700,989,56328.08.9627403384MaskedVByte
7,622,896,87831.910.208364792VarintG8IU
8,060,125,03533.711.5084569476Streamvbyte 2019
8,594,342,21635.911.5052286376libfor
23,918,861,764100.032.0058245924Copy

Block size: 64Ki = 256k bytes. Ki=1024 Integers

SizeRatio %Bits/IntegerC Time MB/sD Time MB/sFunction
3,164,940,56213.24.2313446004TurboPFor 64Ki
3,273,213,46413.74.3814967008TurboPFor256 64Ki
3,965,982,95416.65.3015202452lz4+DT 64Ki
4,234,154,42717.75.664365672qmx 64Ki
6,074,995,11725.48.1319762916blosc_lz4 64Ki
8,773,150,64436.711.7425485204blosc_lz 64Ki

"lz4+DT 64Ki" = Delta+Transpose from TurboPFor + lz4<br> "blosc_lz4" internal lz4 compressor+vectorized shuffle

- Time Series:
FunctionC MB/ssizeratio%D MB/sText
bvzenc321063245,9090.00812823ZigZag
bvzzenc32891456,7130.01013499ZigZag Delta of delta
vsenc3212294140,4000.02412877Variable Simple
p4nzenc256v321932596,0180.1013326TurboPFor256 ZigZag
p4ndenc256v321961596,0180.1013339TurboPFor256 Delta
bitndpack256v3212564909,1890.1613505TurboPackV256 Delta
p4nzenc3218101,159,6330.208502TurboPFor ZigZag
p4nzenc128v3217951,159,6330.2013338TurboPFor ZigZag
bitnzpack256v3296511,254,7570.2213503TurboPackV256 ZigZag
bitnzpack128v32101551,472,8040.2613380TurboPackV ZigZag
vbddenc32619818,057,2963.1310982TurboVByte Delta of delta
memcpy13397577,141,992100.00
- Transpose/Shuffle (no compression)
    ./icapp -e117,118,119 ZIPF
SizeC Time MB/sD Time MB/sFunction
100,000,00094009132TPbyte 4 TurboPFor Byte Transpose/shuffle AVX2
100,000,00087848860TPbyte 4 TurboPFor Byte Transpose/shuffle SSE
100,000,00076887656Blosc_Shuffle AVX2
100,000,00052047460TPnibble 4 TurboPFor Nibble Transpose/shuffle SSE
100,000,00066206284Blosc shuffle SSE
100,000,00031563372Bitshuffle AVX2
100,000,00021002176Bitshuffle SSE
- (Lossy) Floating point compression:
    ./icapp -Fd file          " 64 bits floating point raw file 
    ./icapp -Ff file          " 32 bits floating point raw file 
    ./icapp -Fcf file         " text file with miltiple entries (ex.  8.657,56.8,4.5 ...)
    ./icapp -Ftf file         " text file (1 entry per line)
    ./icapp -Ftf file -v5     " + display the first entries read
    ./icapp -Ftf file.csv -K3 " but 3th column in a csv file (ex. number,Text,456.5 -> 456.5
    ./icapp -Ftf file -g.001  " lossy compression with allowed pointwise relative error 0.001
- Compressed Inverted Index Intersections with GOV2<br />

GOV2: 426GB, 25 Millions documents, average doc. size=18k.

max.docid/qTime sq/sms/q% docid found
1.0007.882283.10.43881
10.00010.541708.50.58584
ALL13.961289.00.776100
q/s: queries/second, ms/q:milliseconds/query
max.docid/qTime sq/sms/q% docids found
1.0002.666772.60.14881
10.0003.395307.50.18884
ALL3.575036.50.199100
Notes:

Compile:

    Download or clone TurboPFor
	git clone https://github.com/powturbo/TurboPFor-Integer-Compression.git
	cd TurboPFor-Integer-Compression
	make
    
    To benchmark TurboPFor + general purpose compression codecs (zstd,lz4, Turbo-Range-Coder, bwt, bitshuffle):
    git clone --recursive https://github.com/powturbo/TurboPFor-Integer-Compression.git
	cd TurboPFor-Integer-Compression
    make ICCODEC=1

    To benchmark external libraries: 
	Download the external libraries from github into the current directory
	Activate/deactivate the ext. libs in the makefile 
    make CODEC1=1 CODEC2=1 ICCODEC=1
Windows visual c++
	nmake /f makefile.vs
Windows visual studio c++
    project files under vs/vs2022

Testing:

- Synthetic data (use ZIPF parameter):

-zipfian distribution alpha = 1.2 (Ex. -a1.0=uniform -a1.5=skewed distribution)<br /> -number of integers = 100.000.000<br /> -integer range from 0 to 255<br />

- Data files:
- Intersections:

1 - Download Gov2 (or ClueWeb09) + query files (Ex. "1mq.txt") from DocId data set<br /> 8GB RAM required (16GB recommended for benchmarking "clueweb09" files).

2 - Create index file

    ./idxcr gov2.sorted .

create inverted index file "gov2.sorted.i" in the current directory

3 - Test intersections

    ./idxqry gov2.sorted.i 1mq.txt

run queries in file "1mq.txt" over the index of gov2 file

- Parallel Query Processing:

1 - Create partitions

    ./idxseg gov2.sorted . -26m -s8

create 8 (CPU hardware threads) partitions for a total of ~26 millions document ids

2 - Create index file for each partition

  ./idxcr gov2.sorted.s*

create inverted index file for all partitions "gov2.sorted.s00 - gov2.sorted.s07" in the current directory

3 - Intersections:

delete "idxqry.o" file and then type "make para" to compile "idxqry" w. multithreading

  ./idxqry gov2.sorted.s*.i 1mq.txt

run queries in file "1mq.txt" over the index of all gov2 partitions "gov2.sorted.s00.i - gov2.sorted.s07.i".

Function usage:

See benchmark "icapp" program for "integer compression" usage examples. In general encoding/decoding functions are of the form:

char *endptr = encode( unsigned *in, unsigned n, char *out, [unsigned start], [int b])<br /> endptr : set by encode to the next character in "out" after the encoded buffer<br /> in : input integer array<br /> n : number of elements<br /> out : pointer to output buffer<br /> b : number of bits. Only for bit packing functions<br /> start : previous value. Only for integrated delta encoding functions

char *endptr = decode( char *in, unsigned n, unsigned *out, [unsigned start], [int b])<br /> endptr : set by decode to the next character in "in" after the decoded buffer<br /> in : pointer to input buffer<br /> n : number of elements<br /> out : output integer array<br /> b : number of bits. Only for bit unpacking functions<br /> start : previous value. Only for integrated delta decoding functions

Simple high level functions:

size_t compressed_size = encode( unsigned *in, size_t n, char *out)<br /> compressed_size : number of bytes written into compressed output buffer out<br />

size_t compressed_size = decode( char *in, size_t n, unsigned *out)<br /> compressed_size : number of bytes read from compressed input buffer in<br />

Function syntax:

public header file to use with documentation:<br /> include/ic.h

Note: Some low level functions (like p4enc32) are limited to 128/256 (SSE/AVX2) integers per call.

Environment:

OS/Compiler (64 bits):
Multithreading:

Knowns issues

LICENSE

References:

Last update: 10 JUN 2023