Home

Awesome

bundle :package: <a href="https://travis-ci.org/r-lyeh/bundle"><img src="https://api.travis-ci.org/r-lyeh/bundle.svg?branch=master" align="right" /></a>

Bundle is an embeddable compression library that supports 23 compression algorithms and 2 archive formats.

Distributed in two files.

Features

Bundle stream format

[0x00  ...]          Optional zero padding (N bits)
[0x70 0x??]          Header (8 bits). De/compression algorithm (8 bits)
                     enum { RAW, SHOCO, LZ4F, MINIZ, LZIP, LZMA20, ZPAQ, LZ4,      //  0..7
                            BROTLI9, ZSTD, LZMA25, BSC, BROTLI11, SHRINKER, CSC20, //  7..14
                            ZSTDF, BCM, ZLING, MCM, TANGELO, ZMOLLY, CRUSH, LZJB,  // 15..22
                            BZIP2                                                  // 23..
                     };
[vle_unpacked_size]  Unpacked size of the stream (N bytes). Data is stored in a variable
                     length encoding value, where bytes are just shifted and added into a
                     big accumulator until MSB is found.
[vle_packed_size]    Packed size of the stream (N bytes). Data is stored in a variable
                     length encoding value, where bytes are just shifted and added into a
                     big accumulator until MSB is found.
[bitstream]          Compressed bitstream (N bytes). As returned by compressor.
                     If possible, header-less bitstreams are preferred.

Bundle .bun archive format

- Files/datas are packed into streams by using any compression method (see above)
- Streams are archived into a standard ZIP file:
  - ZIP entry compression is (0) for packed streams and (1-9) for unpacked streams.
  - ZIP entry comment is a serialized JSON of (file) meta-datas (@todo).
- Note: you can mix streams of different algorithms into the very same ZIP archive.

Showcase

#include <cassert>
#include "bundle.h"

int main() {
    using namespace bundle;
    using namespace std;

    // 23 mb dataset
    string original( "There's a lady who's sure all that glitters is gold" );
    for (int i = 0; i < 18; ++i) original += original + string( i + 1, 32 + i );

    // pack, unpack & verify all encoders
    vector<unsigned> libs { 
        RAW, SHOCO, LZ4F, MINIZ, LZIP, LZMA20,
        ZPAQ, LZ4, BROTLI9, ZSTD, LZMA25,
        BSC, BROTLI11, SHRINKER, CSC20, BCM,
        ZLING, MCM, TANGELO, ZMOLLY, CRUSH, LZJB
    };
    for( auto &lib : libs ) {
        string packed = pack(lib, original);
        string unpacked = unpack(packed);
        cout << original.size() << " -> " << packed.size() << " bytes (" << name_of(lib) << ")" << endl;
        assert( original == unpacked );
    }

    cout << "All ok." << endl;
}

On choosing compressors (on a regular basis)

RankCompression ratioFastest compressorsFastest decompressorsAverage speedMemory efficiency
1st91.15% ZPAQ958.18MB/s RAW2231.20MB/s RAW1340.63MB/s RAWtbd
2nd90.71% MCM358.41MB/s LZ4F993.68MB/s LZ4508.50MB/s LZ4Ftbd
3rd90.02% TANGELO240.87MB/s SHRINKER874.83MB/s LZ4F334.57MB/s SHRINKERtbd
4th88.31% BSC223.28MB/s LZJB547.62MB/s SHRINKER267.57MB/s LZJBtbd
5th87.74% LZMA25210.74MB/s ZSTDF382.52MB/s MINIZ246.66MB/s ZSTDFtbd
6th87.74% LZIP159.59MB/s SHOCO380.39MB/s ZSTD209.32MB/s SHOCOtbd
7th87.63% BROTLI1140.19MB/s ZLING333.76MB/s LZJB65.40MB/s ZLINGtbd
8th87.50% CSC2033.67MB/s CRUSH304.06MB/s SHOCO60.29MB/s CRUSHtbd
9th87.15% BCM13.73MB/s ZSTD297.34MB/s ZSTDF26.51MB/s ZSTDtbd
10th86.44% ZMOLLY09.00MB/s BSC287.83MB/s CRUSH13.44MB/s BZIP2tbd
11th86.17% LZMA2008.51MB/s BZIP2287.58MB/s BROTLI911.51MB/s BROTLI9tbd
12th86.05% BROTLI906.77MB/s ZMOLLY246.88MB/s BROTLI1110.78MB/s BSCtbd
13th85.27% BZIP205.87MB/s BROTLI9175.54MB/s ZLING08.13MB/s LZ4tbd
14th85.24% ZSTD05.21MB/s BCM118.49MB/s LZMA2507.24MB/s MINIZtbd
15th82.89% ZLING04.08MB/s LZ4108.71MB/s LZMA2006.73MB/s ZMOLLYtbd
16th81.68% MINIZ03.65MB/s MINIZ72.72MB/s CSC2005.27MB/s LZMA20tbd
17th77.93% ZSTDF02.70MB/s LZMA2057.05MB/s LZIP04.90MB/s LZMA25tbd
18th77.57% LZ402.50MB/s LZMA2531.88MB/s BZIP204.83MB/s CSC20tbd
19th77.37% CRUSH02.50MB/s CSC2013.44MB/s BSC04.65MB/s BCMtbd
20th67.30% SHRINKER02.25MB/s MCM06.68MB/s ZMOLLY04.13MB/s LZIPtbd
21th63.30% LZ4F02.14MB/s LZIP04.20MB/s BCM02.29MB/s MCMtbd
22th59.37% LZJB01.15MB/s TANGELO02.34MB/s MCM01.17MB/s TANGELOtbd
23th06.42% SHOCO00.24MB/s BROTLI1101.18MB/s TANGELO00.48MB/s BROTLI11tbd
24th00.00% RAW00.23MB/s ZPAQ00.21MB/s ZPAQ00.22MB/s ZPAQtbd

Charts

@mavam has an awesome R script that plots some fancy graphics in his compbench repository. The following CC images are a few of his own showcasing an invocation for a 10,000 packet PCAP trace:

Tradeoff Throughput Scatterplot Compression Ratio

API - data

namespace bundle
{
  // low level API (raw pointers)
  bool is_packed( *ptr, len );
  bool is_unpacked( *ptr, len );
  unsigned type_of( *ptr, len );
  size_t len( *ptr, len );
  size_t zlen( *ptr, len );
  const void *zptr( *ptr, len );
  bool pack( unsigned Q, *in, len, *out, &zlen );
  bool unpack( unsigned Q, *in, len, *out, &zlen );

  // medium level API, templates (in-place)
  bool is_packed( T );
  bool is_unpacked( T );
  unsigned type_of( T );
  size_t len( T );
  size_t zlen( T );
  const void *zptr( T );
  bool unpack( T &, T );
  bool pack( unsigned Q, T &, T );

  // high level API, templates (copy)
  T pack( unsigned Q, T );
  T unpack( T );
}

For a complete review check bundle.hpp header

API - archives

namespace bundle
{
  struct file : map<string,string> { // ~map of properties
    bool    has(property);           // property check
    string &get(property);           // property access
    string  toc() const;             // inspection (json)
  };
  struct archive : vector<file>    { // ~sequence of files
    void   bun(string);              // .bun serialization
    string bun() const;              // .bun serialization
    void   zip(string);              // .zip serialization
    string zip() const;              // .zip serialization
    string toc() const;              // inspection (json)
  };
}

Build Directives (Licensing)

#define directiveDefault valueMeaning
BUNDLE_NO_APACHE2(undefined)Define to remove any Apache 2.0 library from build
BUNDLE_NO_BSD2(undefined)Define to remove any BSD-2 library from build
BUNDLE_NO_BSD3(undefined)Define to remove any BSD-3 library from build
BUNDLE_NO_CDDL(undefined)Define to remove any CDDL library from build
BUNDLE_NO_GPL(undefined)Define to remove any GPL library from build
BUNDLE_NO_MIT(undefined)Define to remove any MIT library from build
BUNDLE_NO_UNLICENSE(undefined)Define to remove any Public Domain library from build (*)

(*): will disable .bun and .zip archive support as well.

Build Directives (Libraries)

#define directiveDefault valueMeaning
BUNDLE_NO_BCM(undefined)Define to remove BCM library from build
BUNDLE_NO_BROTLI(undefined)Define to remove Brotli library from build
BUNDLE_NO_BSC(undefined)Define to remove LibBsc library from build
BUNDLE_NO_BZIP2(undefined)Define to remove BZip2 library from build
BUNDLE_NO_CRUSH(undefined)Define to remove CRUSH library from build
BUNDLE_NO_CSC(undefined)Define to remove CSC library from build
BUNDLE_NO_LZ4(undefined)Define to remove LZ4/LZ4 libraries
BUNDLE_NO_LZIP(undefined)Define to remove EasyLZMA library from build
BUNDLE_NO_LZJB(undefined)Define to remove LZJB library from build
BUNDLE_NO_LZMA(undefined)Define to remove LZMA library from build
BUNDLE_NO_MCM(undefined)Define to remove MCM library from build
BUNDLE_NO_MINIZ(undefined)Define to remove MiniZ library from build (*)
BUNDLE_NO_SHOCO(undefined)Define to remove Shoco library from build
BUNDLE_NO_SHRINKER(undefined)Define to remove Shrinker library from build
BUNDLE_NO_TANGELO(undefined)Define to remove TANGELO library from build
BUNDLE_NO_ZLING(undefined)Define to remove ZLING library from build
BUNDLE_NO_ZMOLLY(undefined)Define to remove ZMOLLY library from build
BUNDLE_NO_ZPAQ(undefined)Define to remove ZPAQ library from build
BUNDLE_NO_ZSTD(undefined)Define to remove ZSTD library from build

(*): will disable .bun and .zip archive support as well.

Build Directives (Other)

#define directiveDefault valueMeaning
BUNDLE_USE_OMP_TIMER(undefined)Define as 1 to use OpenMP timers
BUNDLE_USE_CXX11(autodetected)Define as 0/1 to disable/enable C++11 features

Licensing table

SoftwareAuthor(s)LicenseVersionMajor changes?
bundler-lyehZLIB/LibPNGlatest
bcmIlya MuravyovPublic Domain1.00istream based now
brotliJyrki Alakuijala, Zoltan SzabadkaApache 2.02015/11/03
bzip2Julian SewardBSD-4
crushIlya MuravyovPublic Domain1.00reentrant fix
cscSiyuan FuPublic Domain2015/06/16
easylzmaIgor Pavlov, Lloyd HilaielPublic Domain0.0.7
endianMathias PanzenböckPublic Domainmsvc fix
libbscIlya GrebnovApache 2.03.1.0
libzlingZhang LiBSD-32015/09/16
libzpaqMatt MahoneyPublic Domain7.05
lz4Yann ColletBSD-21.7.1
lzjbJeff BonwickCDDL license2010
mcmMathieu ChartierGPL0.84
minizRich GeldreichPublic Domainv1.15 r.4.1alignment fix
shocoChristian SchrammMIT2015/03/16
shrinkerSiyuan FuBSD-3rev 3
tangeloMatt Mahoney, Jan OndrusGPL2.41reentrant fixes, istream based now
zmollyZhang LiBSD-30.0.1reentrant and memstream fixes
zstdYann ColletBSD-20.3.2

Evaluated alternatives

FastLZ, FLZP, LibLZF, LZFX, LZHAM, LZLIB, LZO, LZP, SMAZ, Snappy, ZLIB, bzip2, Yappy, CMix, M1

Creating DLLs

cl bundle.cpp -DBUNDLE_API=BUNDLE_API_EXPORT /LD
cl demo.cc -DBUNDLE_API=BUNDLE_API_IMPORT bundle.lib

Changelog