Home

Awesome

bits-extra

CircleCI Travis

Useful bitwise operations.

This library exposes support for some BMI2 CPU instructions on some x86 based CPUs.

Currently support operations include:

These operations are useful for high performance bit manipulation for applications such as succinct data structures.

In the case where the target CPU architectures do not support these instruction set, a very slow emulated version of these operations is provided instead.

Note ghc-8.4.1 is required to target the BMI2 CPU instructions.

Compilation

It is sufficient to build, test and benchmark the library as follows for emulated behaviour:

stack build
stack test
stack bench

To target the BMI2 instruction set, add the bmi2 flag:

stack build --flag bits-extra:bmi2
stack test  --flag bits-extra:bmi2
stack bench --flag bits-extra:bmi2

Benchmark results

Benchmarks with bmi2 flag defined on ghc-8.4.1 on Intel Core i7 2.9 GHz

Benchmark bench: RUNNING...
Fast pdep enabled: True
Fast pext enabled: True
benchmarking popCount-single/popCount64
time                 6.698 ns   (6.634 ns .. 6.766 ns)
                     0.999 R²   (0.999 R² .. 1.000 R²)
mean                 6.750 ns   (6.692 ns .. 6.841 ns)
std dev              243.5 ps   (195.3 ps .. 347.1 ps)
variance introduced by outliers: 60% (severely inflated)

benchmarking popCount-single/popCount32
time                 5.543 ns   (5.511 ns .. 5.574 ns)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 5.596 ns   (5.546 ns .. 5.650 ns)
std dev              177.0 ps   (144.8 ps .. 215.8 ps)
variance introduced by outliers: 54% (severely inflated)

benchmarking pdep-vector/popCount64
time                 1.083 μs   (1.074 μs .. 1.097 μs)
                     0.998 R²   (0.997 R² .. 1.000 R²)
mean                 1.089 μs   (1.075 μs .. 1.104 μs)
std dev              50.20 ns   (37.87 ns .. 70.22 ns)
variance introduced by outliers: 62% (severely inflated)

benchmarking pdep-vector/popCount32
time                 1.074 μs   (1.065 μs .. 1.083 μs)
                     0.999 R²   (0.999 R² .. 1.000 R²)
mean                 1.079 μs   (1.071 μs .. 1.090 μs)
std dev              31.57 ns   (23.09 ns .. 49.88 ns)
variance introduced by outliers: 40% (moderately inflated)

benchmarking pdep-single/pdep64
time                 4.920 ns   (4.856 ns .. 5.007 ns)
                     0.996 R²   (0.990 R² .. 1.000 R²)
mean                 4.945 ns   (4.874 ns .. 5.125 ns)
std dev              361.5 ps   (150.9 ps .. 703.7 ps)
variance introduced by outliers: 86% (severely inflated)

benchmarking pdep-single/pdep32
time                 5.203 ns   (5.163 ns .. 5.244 ns)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 5.229 ns   (5.193 ns .. 5.268 ns)
std dev              126.2 ps   (109.9 ps .. 148.6 ps)
variance introduced by outliers: 41% (moderately inflated)

benchmarking pdep-vector/pdep64
time                 1.627 μs   (1.617 μs .. 1.641 μs)
                     0.999 R²   (0.998 R² .. 0.999 R²)
mean                 1.679 μs   (1.652 μs .. 1.723 μs)
std dev              107.2 ns   (74.61 ns .. 155.4 ns)
variance introduced by outliers: 75% (severely inflated)

benchmarking pdep-vector/pdep32
time                 1.626 μs   (1.621 μs .. 1.634 μs)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 1.637 μs   (1.624 μs .. 1.651 μs)
std dev              44.27 ns   (35.00 ns .. 58.61 ns)
variance introduced by outliers: 35% (moderately inflated)

benchmarking pext-single/pext64
time                 5.182 ns   (5.133 ns .. 5.239 ns)
                     0.999 R²   (0.998 R² .. 0.999 R²)
mean                 5.255 ns   (5.181 ns .. 5.371 ns)
std dev              314.1 ps   (211.7 ps .. 441.7 ps)
variance introduced by outliers: 81% (severely inflated)

benchmarking pext-single/pext32
time                 5.269 ns   (5.222 ns .. 5.340 ns)
                     0.997 R²   (0.993 R² .. 1.000 R²)
mean                 5.315 ns   (5.248 ns .. 5.513 ns)
std dev              388.9 ps   (145.5 ps .. 713.4 ps)
variance introduced by outliers: 86% (severely inflated)

benchmarking pext-vector/pext64
time                 1.647 μs   (1.632 μs .. 1.667 μs)
                     0.998 R²   (0.996 R² .. 0.999 R²)
mean                 1.673 μs   (1.649 μs .. 1.719 μs)
std dev              107.5 ns   (67.19 ns .. 171.9 ns)
variance introduced by outliers: 76% (severely inflated)

benchmarking pext-vector/pext32
time                 1.626 μs   (1.616 μs .. 1.637 μs)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 1.630 μs   (1.617 μs .. 1.649 μs)
std dev              53.18 ns   (33.93 ns .. 80.86 ns)
variance introduced by outliers: 44% (moderately inflated)

benchmarking plus-vector/plus64
time                 1.641 μs   (1.628 μs .. 1.657 μs)
                     0.999 R²   (0.999 R² .. 1.000 R²)
mean                 1.647 μs   (1.635 μs .. 1.666 μs)
std dev              50.67 ns   (38.54 ns .. 76.92 ns)
variance introduced by outliers: 41% (moderately inflated)

benchmarking plus-vector/plus32
time                 1.943 μs   (1.932 μs .. 1.953 μs)
                     1.000 R²   (0.999 R² .. 1.000 R²)
mean                 1.948 μs   (1.934 μs .. 1.961 μs)
std dev              45.86 ns   (37.10 ns .. 59.21 ns)
variance introduced by outliers: 29% (moderately inflated)

Benchmark bench: FINISH

Benchmarks with bmi2 flag NOT defined on ghc-8.4.1 on Intel Core i7 2.9 GHz

Benchmark bench: RUNNING...
Fast pdep enabled: False
Fast pext enabled: False
benchmarking popCount-single/popCount64
time                 6.572 ns   (6.433 ns .. 6.765 ns)
                     0.996 R²   (0.992 R² .. 0.999 R²)
mean                 6.637 ns   (6.551 ns .. 6.758 ns)
std dev              346.9 ps   (252.8 ps .. 542.9 ps)
variance introduced by outliers: 76% (severely inflated)

benchmarking popCount-single/popCount32
time                 5.196 ns   (5.152 ns .. 5.253 ns)
                     0.999 R²   (0.999 R² .. 1.000 R²)
mean                 5.232 ns   (5.186 ns .. 5.293 ns)
std dev              186.6 ps   (152.8 ps .. 237.0 ps)
variance introduced by outliers: 60% (severely inflated)

benchmarking pdep-vector/popCount64
time                 5.409 μs   (5.344 μs .. 5.476 μs)
                     0.998 R²   (0.997 R² .. 0.999 R²)
mean                 5.458 μs   (5.391 μs .. 5.549 μs)
std dev              258.1 ns   (193.1 ns .. 374.6 ns)
variance introduced by outliers: 60% (severely inflated)

benchmarking pdep-vector/popCount32
time                 3.849 μs   (3.814 μs .. 3.885 μs)
                     0.999 R²   (0.999 R² .. 0.999 R²)
mean                 3.860 μs   (3.818 μs .. 3.919 μs)
std dev              166.6 ns   (131.0 ns .. 236.7 ns)
variance introduced by outliers: 56% (severely inflated)

benchmarking pdep-single/pdep64
time                 10.46 ns   (10.35 ns .. 10.58 ns)
                     0.999 R²   (0.999 R² .. 0.999 R²)
mean                 10.53 ns   (10.42 ns .. 10.65 ns)
std dev              385.2 ps   (321.8 ps .. 480.9 ps)
variance introduced by outliers: 60% (severely inflated)

benchmarking pdep-single/pdep32
time                 10.57 ns   (10.49 ns .. 10.68 ns)
                     0.999 R²   (0.999 R² .. 1.000 R²)
mean                 10.69 ns   (10.57 ns .. 10.80 ns)
std dev              404.1 ps   (341.0 ps .. 501.8 ps)
variance introduced by outliers: 62% (severely inflated)

benchmarking pdep-vector/pdep64
time                 3.113 μs   (3.091 μs .. 3.140 μs)
                     0.999 R²   (0.999 R² .. 1.000 R²)
mean                 3.130 μs   (3.106 μs .. 3.158 μs)
std dev              82.60 ns   (65.72 ns .. 103.1 ns)
variance introduced by outliers: 32% (moderately inflated)

benchmarking pdep-vector/pdep32
time                 3.096 μs   (3.070 μs .. 3.125 μs)
                     0.999 R²   (0.999 R² .. 1.000 R²)
mean                 3.086 μs   (3.059 μs .. 3.114 μs)
std dev              91.30 ns   (77.41 ns .. 110.3 ns)
variance introduced by outliers: 37% (moderately inflated)

benchmarking pext-single/pext64
time                 73.06 ns   (72.35 ns .. 73.84 ns)
                     0.999 R²   (0.999 R² .. 1.000 R²)
mean                 73.65 ns   (73.02 ns .. 74.35 ns)
std dev              2.368 ns   (1.974 ns .. 2.918 ns)
variance introduced by outliers: 50% (severely inflated)

benchmarking pext-single/pext32
time                 74.07 ns   (73.46 ns .. 74.62 ns)
                     0.999 R²   (0.999 R² .. 1.000 R²)
mean                 74.18 ns   (73.41 ns .. 74.83 ns)
std dev              2.550 ns   (2.161 ns .. 3.134 ns)
variance introduced by outliers: 54% (severely inflated)

benchmarking pext-vector/pext64
time                 72.78 μs   (71.21 μs .. 74.87 μs)
                     0.996 R²   (0.993 R² .. 0.999 R²)
mean                 72.17 μs   (71.38 μs .. 73.41 μs)
std dev              3.105 μs   (2.302 μs .. 4.741 μs)
variance introduced by outliers: 46% (moderately inflated)

benchmarking pext-vector/pext32
time                 72.69 μs   (72.04 μs .. 73.28 μs)
                     0.999 R²   (0.999 R² .. 0.999 R²)
mean                 73.00 μs   (72.25 μs .. 73.85 μs)
std dev              2.634 μs   (2.237 μs .. 3.137 μs)
variance introduced by outliers: 37% (moderately inflated)

benchmarking plus-vector/plus64
time                 1.856 μs   (1.833 μs .. 1.876 μs)
                     0.999 R²   (0.999 R² .. 0.999 R²)
mean                 1.861 μs   (1.844 μs .. 1.880 μs)
std dev              63.39 ns   (54.74 ns .. 75.83 ns)
variance introduced by outliers: 46% (moderately inflated)

benchmarking plus-vector/plus32
time                 1.547 μs   (1.532 μs .. 1.562 μs)
                     0.999 R²   (0.998 R² .. 0.999 R²)
mean                 1.554 μs   (1.538 μs .. 1.578 μs)
std dev              62.25 ns   (48.22 ns .. 91.40 ns)
variance introduced by outliers: 54% (severely inflated)

Benchmark bench: FINISH

Reference

Acknowledgement

A lot of people helped in the writing of this library and the underlying GHC patch: