Home

Awesome

TurboHist: Fastest Histogram Construction

Benchmark:

- Uniform/Skewed distribution:

Benchmark Intel CPU: i7-9700K 3.6GHz gcc 11.2

Uniform distribution - enwik9 Text file, size=1.000.0000.000

FunctionMB/sCycle/ByteLanguagePackage
1:hist_1_8 naiv 8 bits2761.011.3423CTurboHist
2:hist_4_8 4 bins/ 8 bits2725.921.3249CTurboHist
3:hist_8_8 8 bins/ 8 bits2850.051.2627CTurboHist
4:hist_4_32 4 bins/32 bits3691.020.9660CTurboHist
5:hist_8_32 8 bins/32 bits3867.260.9561CTurboHist
6:hist_4_64 4 bins/64 bits4040.550.9103CTurboHist
7:hist_8_64 8 bins/64 bits4053.370.9035CTurboHist
8:histr_4_64 4/64+run3915.850.9668CTurboHist
9:histr_8_64 8/64+run3916.510.9286CTurboHist
10:hist_4_128 4 bins/sse4.13643.201.0081CTurboHist
11:hist_8_128 8 bins/sse4.13607.060.9845CTurboHist
12:hist_4_256 4 bins/avx23522.271.0195CTurboHist
13:hist_8_256 8 bins/avx23542.251.0366CTurboHist
15:hist_8_64asm inline asm4161.870.8787inline asmTurboHist
18:count2x64 inline asm3963.910.9172inline asmCountbench
20:histo_ref2702.571.3567CRyg
21:histo_cpp_1x1876.131.8236CRyg
22:histo_cpp_2x2664.781.5935CRyg
23:histo_cpp_4x2817.771.2944CRyg
24:histo_asm_scalar43130.081.1609asmRyg
25:histo_asm_scalar83353.081.0636asmRyg
26:histo_asm_scalar8_var3704.880.9856asmRyg
27:histo_asm_scalar8_var24085.480.8913asmRyg
28:histo_asm_scalar8_var34132.540.8870asmRyg
29:histo_asm_scalar8_var44083.920.8970asmRyg
30:histo_asm_scalar8_var54002.210.9025asmRyg
31:histo_asm_sse43153.011.1445asmRyg
32:memcpy13724.290.2698C

Skewed distribution - enwik9.bwt Text file, size=1.000.0000.000

FunctionMB/sCycle/ByteLanguage
1:hist_1_8 naiv 8 bits1170.893.0642C
2:hist_4_8 4 bins/ 8 bits2707.741.3321C
3:hist_8_8 8 bins/ 8 bits2804.081.3208C
4:hist_4_32 4 bins/32 bits3118.541.1402C
5:hist_8_32 8 bins/32 bits3780.160.9714C
6:hist_4_64 4 bins/64 bits3646.250.9980C
7:hist_8_64 8 bins/64 bits3941.960.9282C
8:histr_4_64 4/64+run5061.620.7270C
9:histr_8_64 8/64+run5135.290.7229C
10:hist_4_128 4 bins/sse4.13535.361.0365C
11:hist_8_128 8 bins/sse4.13654.410.9791C
12:hist_4_256 4 bins/avx23329.871.1022C
13:hist_8_256 8 bins/avx23540.361.0343C
15:hist_8_64asm inline asm4047.740.9013inline asm
18:count2x64 inline asm3969.920.9262inline asm
20:histo_ref1182.613.0718C
21:histo_cpp_1x1213.422.9748C
22:histo_cpp_2x2115.601.7373C
23:histo_cpp_4x1801.972.0024C
24:histo_asm_scalar43092.871.1561asm
25:histo_asm_scalar83203.951.1139asm
26:histo_asm_scalar8_var3460.451.0422asm
27:histo_asm_scalar8_var23659.610.9878asm
28:histo_asm_scalar8_var33769.960.9569asm
29:histo_asm_scalar8_var43996.750.8905asm
30:histo_asm_scalar8_var54642.100.7719asm
31:histo_asm_sse43091.361.1670asm
32:memcpy15594.280.2412C

All zeros: size=1.000.0000.000

FunctionMB/sCycle/ByteLanguage
1:hist_1_8 naiv 8 bits877.274.0805C
2:hist_4_8 4 bins/ 8 bits2650.841.3485C
3:hist_8_8 8 bins/ 8 bits2743.401.2994C
4:hist_4_32 4 bins/32 bits2978.831.2006C
5:hist_8_32 8 bins/32 bits3775.450.9555C
6:hist_4_64 4 bins/64 bits3411.111.0530C
7:hist_8_64 8 bins/64 bits3928.090.9342C
8:histr_4_64 4/64+run18998.870.1868C
9:histr_8_64 8/64+run19629.280.1869C
10:hist_4_128 4 bins/sse4.13365.401.0717C
11:hist_8_128 8 bins/sse4.13632.610.9950C
12:hist_4_256 4 bins/avx23112.151.1576C
13:hist_8_256 8 bins/avx23497.081.0205C
15:hist_8_64asm inline asm4089.970.8817inline asm
18:count2x64 inline asm3881.980.9158inline asm
20:histo_ref882.934.1072C
21:histo_cpp_1x873.204.1069C
22:histo_cpp_2x1720.192.0961C
23:histo_cpp_4x1866.992.0817C
24:histo_asm_scalar42995.841.1942asm
25:histo_asm_scalar83107.301.1618asm
26:histo_asm_scalar8_var3288.671.1143asm
27:histo_asm_scalar8_var23290.921.0957asm
28:histo_asm_scalar8_var33707.410.9763asm
29:histo_asm_scalar8_var43988.010.9019asm
30:histo_asm_scalar8_var514076.090.2564asm
31:histo_asm_sse43020.321.1975asm
32:memcpy14057.530.2636C

(bold = pareto) MB=1.000.000

Compile:

    make
 or
    make AVX2=1

Usage:

    turbohist [-e#] file [-I#] [-z]
    options:
    -e#     # = function numbers separated by ,
    -I#     # = number of iteration
            set to -I15 for accurate timings  
    -z      set read buffer to zeros
            

Examples:

    ./turbohist file
    ./turbohist -e1,7,9

Environment:

OS/Compiler (32 + 64 bits):

Last update: 01 JAN 2022