Home

Awesome

LZHAM - Lossless Data Compression Codec

<p>Copyright (c) 2009-2015 Richard Geldreich, Jr. - richgel99@gmail.com - MIT License</p> <p>Notes: This is the development version of the LZHAM repo, currently at v1.1. The stable repo (v1.0) is here: https://github.com/richgel999/lzham_codec

The "devel" version includes an even stronger (but slower) parser, enabled using the -x option. This parser supports multiple arrivals at each lookahead entry.

<p>LZHAM is now in the Public Domain, but I haven't had a chance to update this repo yet. <p>LZHAM is a lossless data compression codec written in C/C++ (specifically C++03), with a compression ratio similar to LZMA but with 1.5x-8x faster decompression speed. It officially supports Linux x86/x64, Windows x86/x64, OSX, and iOS, with Android support on the way.</p> <p>Some slightly out of date API documentation is here (I'll be migrating this to github): https://code.google.com/p/lzham/wiki/API_Docs</p> <h3>Introduction</h3> <p>LZHAM is a lossless (LZ based) data compression codec optimized for particularly fast decompression at very high compression ratios with a zlib compatible API. It's been developed over a period of 3 years and alpha versions have already shipped in many products. (The alpha is here: https://code.google.com/p/lzham/) LZHAM's decompressor is slower than zlib's, but generally much faster than LZMA's, with a compression ratio that is typically within a few percent of LZMA's and sometimes better.</p> <p>LZHAM's compressor is intended for offline use, but it is tested alongside the decompressor on mobile devices and is usable on the faster settings.</p> <p>LZHAM's decompressor currently has a higher cost to initialize than LZMA, so the threshold where LZHAM is typically faster vs. LZMA decompression is between 1000-13,000 of *compressed* output bytes, depending on the platform. It is not a good small block compressor: it likes large (10KB-15KB minimum) blocks.</p> <p>LZHAM has simple support for patch files (delta compression), but this is a side benefit of its design, not its primary use case. Internally it supports LZ matches up to ~64KB and very large dictionaries (up to .5 GB).</p> <p>LZHAM may be valuable to you if you compress data offline and distribute it to many customers, care about read/download times, and decompression speed/low CPU+power use are important to you.</p> <p>I've been profiling LZHAM vs. LZMA and publishing the results on my blog: http://richg42.blogspot.com</p> <p>Some independent benchmarks of the previous alpha versions: http://heartofcomp.altervista.org/MOC/MOCADE.htm, http://mattmahoney.net/dc/text.html</p> <p>LZHAM has been integrated into the 7zip archiver (command line and GUI) as a custom codec plugin: http://richg42.blogspot.com/2015/11/lzham-custom-codec-plugin-for-7-zip.html</p> <h3>10GB Benchmark Results</h3>

Results with 7zip-LZHAM 9.38 32-bit (64MB dictionary) on Matt Mahoney's 10GB benchmark:

LZHAM (-mx=8): 3,577,047,629 Archive Test Time: 70.652 secs
LZHAM (-mx=9): 3,573,782,721 Archive Test Time: 71.292 secs
LZMA  (-mx=9): 3,560,052,414 Archive Test Time: 223.050 secs
7z .ZIP      : 4,681,291,655 Archive Test Time: 73.304 secs (unzip v6 x64 test time: 61.074 secs)
<h3>Most Common Question: So how does it compare to other libs like LZ4?</h3>

There is no single compression algorithm that perfectly suites all use cases and practical constraints. LZ4 and LZHAM are tools which lie at completely opposite ends of the spectrum:

<h3>How Much Memory Does It Need?</h3>

For decompression it's easy to compute:

I'll be honest here, the compressor is currently an angry beast when it comes to memory. The amount needed depends mostly on the compression level and dict. size. It's approximately (max_probes=128 at level -m4): comp_mem = min(512 * 1024, dict_size / 8) * max_probes * 6 + dict_size * 9 + 22020096

Compression mem usage examples from Windows lzhamtest_x64 (note the equation is pretty off for small dictionary sizes):

<h3>Compressed Bitstream Compatibility</h3> <p>v1.0's bitstream format is now locked in place, so any future v1.x releases will be backwards/forward compatible with compressed files written with v1.0. The only thing that could change this are critical bugfixes.</p> <p>Note LZHAM v1.x bitstreams are NOT backwards compatible with any of the previous alpha versions on Google Code.</p> <h3>Platforms/Compiler Support</h3>

LZHAM currently officially supports x86/x64 Linux, iOS, OSX, FreeBSD, and Windows x86/x64. At one time the codec compiled and ran fine on Xbox 360 (PPC, big endian). Android support is coming next. It should be easy to retarget by modifying the macros in lzham_core.h.</p>

<p>LZHAM has optional support for multithreaded compression. It supports gcc built-ins or MSVC intrinsics for atomic ops. For threading, it supports OSX specific Pthreads, generic Pthreads, or Windows API's.</p> <p>For compilers, I've tested with gcc, clang, and MSVC 2008, 2010, and 2013. In previous alphas I also compiled with TDM-GCC x64.</p> <h3>API</h3>

LZHAM supports streaming or memory to memory compression/decompression. See include/lzham.h. LZHAM can be linked statically or dynamically, just study the headers and the lzhamtest project. On Linux/OSX, it's only been tested with static linking so far.

LZHAM also supports a usable subset of the zlib API with extensions, either include/zlib.h or #define LZHAM_DEFINE_ZLIB_API and use include/lzham.h.

<h3>Usage Tips</h3> <h3>Codec Test App</h3>

lzhamtest_x86/x64 is a simple command line test program that uses the LZHAM codec to compress/decompress single files. lzhamtest is not intended as a file archiver or end user tool, it's just a simple testbed.

-- Usage examples:

-- Options

Valid dictionary sizes are [15,26] for x86, and [15,29] for x64. (See LZHAM_MIN_DICT_SIZE_LOG2, etc. defines in include/lzham.h.) The x86 version defaults to 64MB (26), and the x64 version defaults to 256MB (28). I wouldn't recommend setting the dictionary size to 512MB unless your machine has more than 4GB of physical memory.

See lzhamtest_x86/x64.exe's help text for more command line parameters.

<h3>Compiling LZHAM</h3>

IMPORTANT: With clang or gcc compile LZHAM with "No strict aliasing" ENABLED: -fno-strict-aliasing

I DO NOT test or develop the codec with strict aliasing:

It might work fine, I don't know yet. This is usually not a problem with MSVC, which defaults to strict aliasing being off.

<h3>ANSI C/C++</h3>

LZHAM supports compiling as plain vanilla ANSI C/C++. To see how the codec configures itself check out lzham_core.h and search for "LZHAM_ANSI_CPLUSPLUS". All platform specific stuff (unaligned loads, threading, atomic ops, etc.) should be disabled when this macro is defined. Note, the compressor doesn't use threads or atomic operations when built this way so it's going to be pretty slow. (The compressor was built from the ground up to be threaded.)

<h3>Known Problems</h3> <p>LZHAM's decompressor is like a drag racer that needs time to get up to speed. LZHAM is not intended or optimized to be used on "small" blocks of data (less than ~10,000 bytes of *compressed* data on desktops, or around 1,000-5,000 on iOS). If your usage case involves calling the codec over and over with tiny blocks then LZMA, LZ4, Deflate, etc. are probably better choices.</p> <p>The decompressor still takes too long to init vs. LZMA. On iOS the cost is not that bad, but on desktop the cost is high. I have reduced the startup cost vs. the alpha but there's still work to do.</p> <p>The compressor is slower than I would like, and doesn't scale as well as it could. I added a reinit() method to make it initialize faster, but it's not a speed demon. My focus has been on ratio and decompression speed.</p> <p>I use tabs=3 spaces, but I think some actual tabs got in the code. I need to run the sources through ClangFormat or whatever.</p> <h3>Special Thanks</h3> <p>Thanks to everyone at the http://encode.ru forums. I read these forums as a lurker before working on LZHAM, and I studied every LZ related post I could get my hands on. Especially anything related to LZ optimal parsing, which still seems like a black art. LZHAM was my way of learning how to implement optimal parsing (and you can see this if you study the progress I made in the early alphas on Google Code).</p> <p>Also, thanks to Igor Pavlov, the original creator of LZMA and 7zip, for advancing the start of the art in LZ compression.</p>