Home

Awesome

Kanzi

Kanzi is a modern, modular, expandable and efficient lossless data compressor implemented in Java.

Unlike the most common lossless data compressors, Kanzi uses a variety of different compression algorithms and supports a wider range of compression ratios as a result. Most usual compressors do not take advantage of the many cores and threads available on modern CPUs (what a waste!). Kanzi is concurrent by design and uses threads to compress several blocks in parallel. It is not compatible with standard compression formats.

Kanzi is a lossless data compressor, not an archiver. It uses checksums (optional but recommended) to validate data integrity but does not have a mechanism for data recovery. It also lacks data deduplication across files. However, Kanzi generates a bitstream that is seekable (one or several consecutive blocks can be decompressed without the need for the whole bitstream to be decompressed).

For more details, check https://github.com/flanglet/kanzi/wiki.

See how to reuse the code here: https://github.com/flanglet/kanzi/wiki/Using-and-extending-the-code

There is a C++ implementation available here: https://github.com/flanglet/kanzi-cpp

There is Go implementation available here: https://github.com/flanglet/kanzi-go

Build Status Quality Gate Status <a href="https://scan.coverity.com/projects/flanglet-kanzi"> <img alt="Coverity Scan Build Status" src="https://img.shields.io/coverity/scan/16859.svg"/> </a> License

Why Kanzi

There are many excellent, open-source lossless data compressors available already.

If gzip is starting to show its age, zstd and brotli are open-source, standardized and used daily by millions of people. Zstd is incredibly fast and probably the best choice in many cases. There are a few scenarios where Kanzi can be a better choice:

Benchmarks

Test machine:

AWS c5a8xlarge: AMD EPYC 7R32 (32 vCPUs), 64 GB RAM

openjdk 21.0.3 2024-04-16

Ubuntu 24.04 LTS

Kanzi version 2.3.0 Java

On this machine, Kanzi uses up to 16 threads (half of CPUs by default).

bzip3 uses 16 threads. zstd uses 16 threads for compression and 1 for decompression, other compressors are single threaded.

The default block size at level 9 is 32MB, severely limiting the number of threads in use, especially with enwik8, but all tests are performed with default values.

silesia.tar

Download at http://sun.aei.polsl.pl/~sdeor/corpus/silesia.zip

CompressorEncoding (sec)Decoding (sec)Size
Original211,957,760
Kanzi -l 11.1371.15380,277,212
Lz4 1.9.5 -40.3210.33079,912,419
Zstd 1.5.6 -2 -T160.1510.27169,556,157
Kanzi -l 21.0821.31368,195,845
Brotli 1.1.0 -21.7490.76168,041,629
Gzip 1.12 -920.091.40367,652,449
Kanzi -l 31.8841.62465,613,695
Zstd 1.5.6 -5 -T160.3560.28963,131,656
Kanzi -l 42.5482.42061,249,959
Zstd 1.5.5 -9 -T160.6900.27859,429,335
Brotli 1.1.0 -68.3880.67758,571,909
Zstd 1.5.6 -13 -T163.2440.27258,041,112
Brotli 1.1.0 -970.070.67756,376,419
Bzip2 1.0.8 -916.946.73454,572,500
Kanzi -l 53.2702.14354,039,773
Zstd 1.5.6 -19 -T1620.870.30352,889,925
Kanzi -l 64.5062.25649,567,817
Lzma 5.4.5 -995.973.17248,745,354
Kanzi -l 74.2463.25147,520,629
bzip3 1.3.2.r4-gb2d61e8 -j 162.6823.22147,237,088
Kanzi -l 89.5499.98343,167,429
Kanzi -l 926.9528.3141,497,835
zpaq 7.15 -m5 -t16213.8213.840,050,429

enwik8

Download at https://mattmahoney.net/dc/enwik8.zip

CompressorEncoding (sec)Decoding (sec)Size
Original100,000,000
Kanzi -l 11.1400.59643,746,017
Kanzi -l 21.0400.72037,816,913
Kanzi -l 31.1480.89233,865,383
Kanzi -l 41.3211.56629,597,577
Kanzi -l 51.7511.64926,528,023
Kanzi -l 62.9541.31924,076,674
Kanzi -l 73.2342.32222,817,373
Kanzi -l 86.8366.74121,181,983
Kanzi -l 917.9918.4120,035,138

Build

First option (ant):

ant

Second option (maven):

mvn -Dmaven.test.skip=true

Credits

Matt Mahoney, Yann Collet, Jan Ondrus, Yuta Mori, Ilya Muravyov, Neal Burns, Fabian Giesen, Jarek Duda, Ilya Grebnov

Disclaimer

Use at your own risk. Always keep a copy of your original files.