Home

Awesome

TurboRC: Turbo Range Coder + rANS Asymmetric Numeral Systems

Build ubuntu

======================================

LICENSE

Usage examples

    ./turborc -e0   file           " benchmark all basic functions using the default simple predictor
    ./turborc -e20  file           " byte gamma coding + rc
    ./turborc -e1,2 file
    ./turborc -e0 -pss -r47 file   " use dual speed predictor with parameters 4 and 7
    ./turborc -e0 -psf -r1 file    " use FSM predictor with filename "FSM1.txt"
    ./turborc -e0 file -Os         " raw 16 bits input
    ./turborc -e0 file -Ou         " raw 32 bits input
    ./turborc -e0 file -Ft         " text file (one integer/line) 
    ./turborc -e0 file -Fc         " text file with multiple integer entries (separated by non-digits characters ex. 456,32,54)
    ./turborc -e0 file -Fc -v5     " like prev., display the first 100 values read
    ./turborc -e0 file -Fcf        " text file with multiple floating-point entries (separated by non-digits characters ex. 456.56,32.1,54)
    ./turborc -e0 file -Fru -Ob    " convert raw 32 bits input to bytes before processing possibly truncating large values
    ./turborc -e0 file -Ft -K3 -Ou " convert column 3 of a csv text file to 32 bits integers
    ./turborc -e0 file -pss -r47   " benchmark all basic functions using the dual speed predictor with paramters 4 and 7
    ./turborc -e0 file -psf -r1    " benchmark all basic functions using the fsm predictor with the paramter file FSM1.txt

Benchmark

see also Entropy Coder Benchmark

File: enwik8bwt generated BWT (wikipedia XML 100MB )
    > turborc -e0 enwik8bwt
    
C Sizeratio%C MB/sD MB/sNameDescription
2333424823.33%88.2088.541:rco0
2239444422.39%82.4686.352:rcco1
2311604823.12%74.1179.263:rcc2o2
2250064022.50%64.9667.814:rcxo8b =o1 context slide
2321396823.21%55.7061.455:rcx2o16b=o2 context slide
2160502021.61%25.4826.989:rcmso1 mixer/sse
2155018421.55%21.8222.7410:rcm2o2 mixer/sse
2081437220.81%23.0624.7711:rcmro2 8b mixer/sse run
2078956020.79%22.6824.7012:rcmrro2 8b mixer/sse run > 2
2317004823.17%156.61129.9413:rcrleRLE o0
2200485622.00%128.43114.9814:rcrle1RLE o1
2341243623.41%73.0870.7417:rcu3varint8 3/5/8 bits
2108836821.09%79.7893.8918:rcqlfcQLFC
2227548422.28%91.8196.6019:becBit EC
3270346832.70%54.4858.2726:rcg-8gamma
3227139632.27%124.84110.1527:rcgz-8gamma zigzag
3419506834.20%66.1365.2328:rcr-8rice
3686402436.86%78.3170.0029:rcrz-8rice zigzag
6354171263.54%552.2887.8442:cdfsbstatic/decode search
6354171263.54%552.38115.4243:cdfsvstatic/decode division
6397668663.98%479.38104.0944:cdfsmstatic/decode division lut
6354172063.54%628.1892.2445:cdfsbstatic interlv/dec. search
2481105224.81%177.39104.3046:cdfbyte adaptive
2481106024.81%191.8098.9647:cdfibyte adaptive interleaved
3100489231.00%158.0672.1848:cdf-8vnibble
3100489631.00%159.5673.5349:cdfi-8vnibble interleaved
2484886424.85%116.76202.2756:ans auto
2484886424.85%126.57175.4357:ans sse
2306837223.07%128.0683.5764:ans autoo1
2352165623.52%50.4382.3266:ansbbitwise ans
100000012100.00%16495.2916050.8279:memcpy

BWT Benchmark: TurboRC vs the best BWT compressors (2023.04)

- enwik8 - 100.000.000 bytes EN Wikipedia

(bold = pareto) MB=1.000.000

C Sizeratio%C MB/sD MB/sName
2069828220.79.0216.04TurboRC 20e9
2074961920.710.709.19bzip3
2078659620.813.7219.26bsc 0e2
2092030620.916.9229.48bsc 0e1
2100208221.017.4236.21TurboRC 20e8
2122421221.219.0337.14bsc 0e0
2182481821.817.8938.13TurboRC 20e6
2201130222.019.3940.86TurboRC 20e5
2900875829.020.7243.39bzip2

- Silesia - Compression Corpus (211 MB mixed binary + text)

C Sizeratio%C MB/sD MB/sName
4840048622.89.6316.08TurboRC 20e9
4862129622.914.5118.05bsc 0e2
4875400523.012.4911.73bzip3
4914224623.218.4728.62bsc 0e1
4958916623.418.6434.69TurboRC 20e8
5011057623.620.9835.99bsc 0e3
5459221025.818.2252.14bzip2

- English.100mb text files from Gutenberg Project

C Sizeratio%C MB/sD MB/sName
1872020617.910.8919.07TurboRC 20e9
1873966117.912.6911.12bzip3
1908095018.220.5940.81TurboRC 20e8
1925505618.415.8321.37bsc 0e2
1937126418.519.6833.01bsc 0e1
1961418018.722.2241.67bsc 0e0
1967338618.820.9142.48TurboRC 20e6
1980979018.922.9045.62TurboRC 20e5
2943318228.119.6541.49bzip2

- html8 : 100MB random html pages from Alexa 1m Top sites

C Sizeratio%C MB/sD MB/sName
1320325013.215.8526.16TurboRC 20e9
1330185013.317.6116.32bzip3
1344261013.430.5556.65TurboRC 20e8
1360192213.619.1226.78bsc 0e2
1368847813.722.7739.10bsc 0e1
1391837213.925.6346.95bsc 0e0
1816260918.221.1667.90bzip2

- enwik9 - 1GB EN Wikipedia

C Sizeratio%C MB/sD MB/sName
16365613016.48.0315.75TurboRC 20e9
16388390616.412.9222.30bsc 0e2
16496074616.515.1834.07bsc 0e1
16520610616.515.0637.90TurboRC 20e8
16707195016.716.5442.30bsc 0e0
16998425017.011.279.89bzip3
25397789125.419.9046.46bzip2

- test1.txt - 1GB ZH (chineese) Wikipedia from GDCC2021

C Sizeratio%C MB/sD MB/sName
23487332223.511.8218.08bsc 0e2
23562887423.614.8835.27TurboRC 20e8
23607775223.613.7928.76bsc 0e1
23620055423.68.9417.03TurboRC 20e9
23946388023.915.3737.21bsc 0e3
24584148124.68.858.51bzip3
25474892225.515.6939.18TurboRC 20e6
25741980225.718.0341.98TurboRC 20e5
35961052236.021.5142.38bzip2

- Text log file:NASA access log 200MB

C Sizeratio%C MB/sD MB/sName
90821224.438.69106.37TurboRC 20e8m32
91385884.514.5213.66bzip3
94388104.619.4143.66bsc 0e2
95031024.620.7453.42bsc 0e1
95292664.620.2465.88TurboRC 20e8
96393104.720.8138.94TurboRC 20e9m32
96473224.721.4558.95bsc 0e3
97102064.78.7318.14TurboRC 20e9
98120184.820.3566.12TurboRC 20e6
98176304.820.8768.17TurboRC 20e5
119604795.815.7183.16bzip2

File Compression

Range Coder

    ./turborc -1 inputfile outputfile         "order 0 simple
    ./turborc -2 inputfile outputfile         "order 1 simple

Range Coder + RLE

    ./turborc -1 inputfile outputfile         "order 0 simple
    ./turborc -2 inputfile outputfile         "order 1 simple
    ./turborc -d inputfile outputfile          "decompress

BWT (Burrows-Wheeler) + QLFC (Quantized Local Frequency Coding) + TurboRC

    ./turborc -20e# inputfile outputfile -l# [-Os]  "bwt compression 
	           #:0:store, 2:bit ec, 3/4:RLE, 5/6:RLE o1, 7/8:QLFC, 9:Max
    ./turborc -d inputfile outputfile             "decompress

Compile:

    Download or clone TurboRC
	git clone --recursive https://github.com/powturbo/Turbo-Range-Coder.git
	cd Turbo-Range-Coder
    
Linux, MacOS, Windows (MingW), Clang,... (see also makefile)
	make
or
	make AVX2=1                                "compile for recent architectures >= haswell
 
Windows visual c++
	nmake /f makefile.vs
Windows visual studio c++
	project files in directory vs/vs2022

Function usage:

See examples in "turborc.c"

Environment:

OS/Compiler (32 + 64 bits):

References:

Last update: 06 AUG 2023