Home

Awesome

Density

Superfast compression library

Density is a free, pure rust (formerly C99) open-source compression library.

It is focused on high-speed compression, at the best ratio possible. Density's algorithms are currently at the pareto frontier of compression speed vs ratio (cf. here for an independent benchmark).

Density features a simple API to enable quick integration in any project.

Why is it so fast ?

One of the biggest assets of density is its work unit: unlike most compression algorithms, it is not a byte but a group of 4 bytes.

When other libraries consume one byte of data to apply their algorithmic processing, density consumes 4 bytes.

That's why density's algorithms have been designed from scratch. They cater for 4-byte work units and still provide interesting compression ratios.

Speed pedigree traits

A "blowup protection" is provided, dramatically increasing the processing speed of incompressible input data. The aim is to never exceed original data size, even for incompressible inputs.

Benchmarks

Quick benchmark

Density features an integrated single-core in-memory benchmark.

Just use cargo bench to assess the performance of the library on your platform, as well as lz4's and snappy's, using the dickens file from the renowned silesia corpus.

It is also possible to run the benchmark with your own files, using the following command: FILE=... cargo bench (replace ... with a file path). Popular compression test files include the silesia corpus files or enwik8.

>> Link to a sample run log

Third-party benchmarks

Other benchmarks featuring density (non-exhaustive list) :

Build

Density can be built on a number of rust-compatible platforms. First use rustup to install rust.

a) get the source code:

    git clone https://github.com/g1mv/density.git
    cd density

b) build and test:

    cargo build
    cargo test

c) run benchmarks with or without your own files:

    cargo bench
    FILE=... cargo bench

About the algorithms

Density's algorithms are general purpose, very fast algorithms. They empirically exhibit strong performance (pareto frontier speed/ratio) on voluminous datasets and text-based datasets, and are less performant on very small datasets. Their 4-byte work unit approach makes them "slower learners" than Lempel-Ziv-based algorithms, albeit being much faster theoretically.

AlgorithmSpeed rankRatio rankDictionary unit(s)Prediction unit(s)Sig. size (bytes)
chameleon1st3rd108
cheetah2nd2nd218
lion3rd1st256

Chameleon is a dictionary lookup based compression algorithm. It is designed for absolute speed (GB/s order) both for compression and decompression.

Cheetah, developed with inputs from Piotr Tarsa, is derived from chameleon and uses swapped dual dictionary lookups with a single prediction unit.

Lion is derived from chameleon/cheetah. It uses different data structures, dual dictionary lookups and 5 prediction units, giving it a compression ratio advantage on moderate to highly compressible data.

Quick start

Using density in your rust code is a straightforward process.

Include the required dependency in the Cargo.toml file:

[dependencies]
density = "0.16.0"

Use the API:

use std::fs::read;
use density::algorithms::chameleon::chameleon::Chameleon;

fn main() {
    let file_mem = read("file_path").unwrap();

    let mut encoded_mem = vec![0_u8; file_mem.len() << 1];
    let encoded_size = Chameleon::encode(&file_mem, &mut encoded_mem).unwrap();

    let mut decoded_mem = vec![0_u8; file_mem.len() << 1];
    let decoded_size = Chameleon::decode(&encoded_mem[0..encoded_size], &mut decoded_mem).unwrap();
}

And that's it! We've done a compression/decompression round trip with a few lines!