Awesome
Fast Byte-Aligned Integer Coding
This crate is a Rust port of Daniel Lemire's streamvbyte
library.
It contains multiple implementations of this format aimed at different integer sizes and value distributions.
Each Coder
implementation produces different formats that are incompatible with one another. Names
provide the length of each of the 4 possible tags for each value so Coder1234
encodes each entry
as 1, 2, 3, or 4 bytes. A scalar implementation is always available at a large speed penalty but
the implementation will automatically use an accelerated implementation for the target if available.
Performance
A scalar implementation is available for all Coder
s but this is typically pretty slow. All
implementations also include acceleration when building for little-endian aarch64
with NEON
instruction support and x86_64
using the SSSE3
or SSE4.1
instruction sets.
Coder1234
will typically be fastest, but other tag length distributions (including Coder1248
for
64-bit values) are available.)
Benchmark numbers were collected on M1 Macbook Air, to generate your own run
cargo bench --bench=streamvbyte
Each benchmark is run with 1024 elements of up to a certain length using a zipf-like distribution.
encode
and decode
benchmarks measure value coding throughput, deltas
variants measure the
same assuming monotonically increasing inputs and compressing deltas between values. data_len
measures throughput for determining data length based on tag values, skip_deltas
does the same
for delta coded streams and includes the sum of all values skipped.
Coder1234
Benchmark | Throughput |
---|---|
encode/1 | 4.8Gelem/s |
encode_deltas/1 | 4.1Gelem/s |
decode/1 | 12.7Gelem/s |
decode_deltas/1 | 5.6Gelem/s |
skip_deltas/1 | 19.9Gelem/s |
data_len/1 | 54.2Gelem/s |
encode/2 | 4.8Gelem/s |
encode_deltas/2 | 3.8Gelem/s |
decode/2 | 8.2Gelem/s |
decode_deltas/2 | 4.2Gelem/s |
skip_deltas/2 | 8.5Gelem/s |
encode/4 | 4.8Gelem/s |
encode_deltas/4 | 4.1Gelem/s |
decode/4 | 8.2Gelem/s |
decode_deltas/4 | 4.2Gelem/s |
skip_deltas/4 | 8.4Gelem/s |
Coder0124
Benchmark | Throughput |
---|---|
encode/1 | 4.2Gelem/s |
encode_deltas/1 | 3.6Gelem/s |
decode/1 | 7.2Gelem/s |
decode_deltas/1 | 4.2Gelem/s |
skip_deltas/1 | 7.1Gelem/s |
data_len/1 | 53.8Gelem/s |
encode/2 | 4.2Gelem/s |
encode_deltas/2 | 3.6Gelem/s |
decode/2 | 7.2Gelem/s |
decode_deltas/2 | 4.2Gelem/s |
skip_deltas/2 | 7.3Gelem/s |
encode/4 | 4.2Gelem/s |
encode_deltas/4 | 3.6Gelem/s |
decode/4 | 7.2Gelem/s |
decode_deltas/4 | 4.2Gelem/s |
skip_deltas/4 | 7.5Gelem/s |
Coder1248
Benchmark | Throughput |
---|---|
encode/1 | 3.1Gelem/s |
encode_deltas/1 | 2.5Gelem/s |
decode/1 | 4.6Gelem/s |
decode_deltas/1 | 3.6Gelem/s |
skip_deltas/1 | 6.0Gelem/s |
data_len/1 | 53.8Gelem/s |
encode/4 | 3.1Gelem/s |
encode_deltas/4 | 2.5Gelem/s |
decode/4 | 4.8Gelem/s |
decode_deltas/4 | 3.6Gelem/s |
skip_deltas/4 | 5.7Gelem/s |
encode/8 | 3.0Gelem/s |
encode_deltas/8 | 2.5Gelem/s |
decode/8 | 4.8Gelem/s |
decode_deltas/8 | 3.0Gelem/s |
skip_deltas/8 | 5.4Gelem/s |