Awesome
bits-extra
Useful bitwise operations.
This library exposes support for some BMI2 CPU instructions on some x86 based CPUs.
Currently support operations include:
pdep
- Parallel Depositpext
- Parallel Extract
These operations are useful for high performance bit manipulation for applications such as succinct data structures.
In the case where the target CPU architectures do not support these instruction set, a very slow emulated version of these operations is provided instead.
Note ghc-8.4.1
is required to target the BMI2 CPU instructions.
Compilation
It is sufficient to build, test and benchmark the library as follows for emulated behaviour:
stack build
stack test
stack bench
To target the BMI2 instruction set, add the bmi2
flag:
stack build --flag bits-extra:bmi2
stack test --flag bits-extra:bmi2
stack bench --flag bits-extra:bmi2
Benchmark results
Benchmarks with bmi2
flag defined on ghc-8.4.1
on Intel Core i7 2.9 GHz
Benchmark bench: RUNNING...
Fast pdep enabled: True
Fast pext enabled: True
benchmarking popCount-single/popCount64
time 6.698 ns (6.634 ns .. 6.766 ns)
0.999 R² (0.999 R² .. 1.000 R²)
mean 6.750 ns (6.692 ns .. 6.841 ns)
std dev 243.5 ps (195.3 ps .. 347.1 ps)
variance introduced by outliers: 60% (severely inflated)
benchmarking popCount-single/popCount32
time 5.543 ns (5.511 ns .. 5.574 ns)
1.000 R² (0.999 R² .. 1.000 R²)
mean 5.596 ns (5.546 ns .. 5.650 ns)
std dev 177.0 ps (144.8 ps .. 215.8 ps)
variance introduced by outliers: 54% (severely inflated)
benchmarking pdep-vector/popCount64
time 1.083 μs (1.074 μs .. 1.097 μs)
0.998 R² (0.997 R² .. 1.000 R²)
mean 1.089 μs (1.075 μs .. 1.104 μs)
std dev 50.20 ns (37.87 ns .. 70.22 ns)
variance introduced by outliers: 62% (severely inflated)
benchmarking pdep-vector/popCount32
time 1.074 μs (1.065 μs .. 1.083 μs)
0.999 R² (0.999 R² .. 1.000 R²)
mean 1.079 μs (1.071 μs .. 1.090 μs)
std dev 31.57 ns (23.09 ns .. 49.88 ns)
variance introduced by outliers: 40% (moderately inflated)
benchmarking pdep-single/pdep64
time 4.920 ns (4.856 ns .. 5.007 ns)
0.996 R² (0.990 R² .. 1.000 R²)
mean 4.945 ns (4.874 ns .. 5.125 ns)
std dev 361.5 ps (150.9 ps .. 703.7 ps)
variance introduced by outliers: 86% (severely inflated)
benchmarking pdep-single/pdep32
time 5.203 ns (5.163 ns .. 5.244 ns)
1.000 R² (0.999 R² .. 1.000 R²)
mean 5.229 ns (5.193 ns .. 5.268 ns)
std dev 126.2 ps (109.9 ps .. 148.6 ps)
variance introduced by outliers: 41% (moderately inflated)
benchmarking pdep-vector/pdep64
time 1.627 μs (1.617 μs .. 1.641 μs)
0.999 R² (0.998 R² .. 0.999 R²)
mean 1.679 μs (1.652 μs .. 1.723 μs)
std dev 107.2 ns (74.61 ns .. 155.4 ns)
variance introduced by outliers: 75% (severely inflated)
benchmarking pdep-vector/pdep32
time 1.626 μs (1.621 μs .. 1.634 μs)
1.000 R² (0.999 R² .. 1.000 R²)
mean 1.637 μs (1.624 μs .. 1.651 μs)
std dev 44.27 ns (35.00 ns .. 58.61 ns)
variance introduced by outliers: 35% (moderately inflated)
benchmarking pext-single/pext64
time 5.182 ns (5.133 ns .. 5.239 ns)
0.999 R² (0.998 R² .. 0.999 R²)
mean 5.255 ns (5.181 ns .. 5.371 ns)
std dev 314.1 ps (211.7 ps .. 441.7 ps)
variance introduced by outliers: 81% (severely inflated)
benchmarking pext-single/pext32
time 5.269 ns (5.222 ns .. 5.340 ns)
0.997 R² (0.993 R² .. 1.000 R²)
mean 5.315 ns (5.248 ns .. 5.513 ns)
std dev 388.9 ps (145.5 ps .. 713.4 ps)
variance introduced by outliers: 86% (severely inflated)
benchmarking pext-vector/pext64
time 1.647 μs (1.632 μs .. 1.667 μs)
0.998 R² (0.996 R² .. 0.999 R²)
mean 1.673 μs (1.649 μs .. 1.719 μs)
std dev 107.5 ns (67.19 ns .. 171.9 ns)
variance introduced by outliers: 76% (severely inflated)
benchmarking pext-vector/pext32
time 1.626 μs (1.616 μs .. 1.637 μs)
1.000 R² (0.999 R² .. 1.000 R²)
mean 1.630 μs (1.617 μs .. 1.649 μs)
std dev 53.18 ns (33.93 ns .. 80.86 ns)
variance introduced by outliers: 44% (moderately inflated)
benchmarking plus-vector/plus64
time 1.641 μs (1.628 μs .. 1.657 μs)
0.999 R² (0.999 R² .. 1.000 R²)
mean 1.647 μs (1.635 μs .. 1.666 μs)
std dev 50.67 ns (38.54 ns .. 76.92 ns)
variance introduced by outliers: 41% (moderately inflated)
benchmarking plus-vector/plus32
time 1.943 μs (1.932 μs .. 1.953 μs)
1.000 R² (0.999 R² .. 1.000 R²)
mean 1.948 μs (1.934 μs .. 1.961 μs)
std dev 45.86 ns (37.10 ns .. 59.21 ns)
variance introduced by outliers: 29% (moderately inflated)
Benchmark bench: FINISH
Benchmarks with bmi2
flag NOT defined on ghc-8.4.1
on Intel Core i7 2.9 GHz
Benchmark bench: RUNNING...
Fast pdep enabled: False
Fast pext enabled: False
benchmarking popCount-single/popCount64
time 6.572 ns (6.433 ns .. 6.765 ns)
0.996 R² (0.992 R² .. 0.999 R²)
mean 6.637 ns (6.551 ns .. 6.758 ns)
std dev 346.9 ps (252.8 ps .. 542.9 ps)
variance introduced by outliers: 76% (severely inflated)
benchmarking popCount-single/popCount32
time 5.196 ns (5.152 ns .. 5.253 ns)
0.999 R² (0.999 R² .. 1.000 R²)
mean 5.232 ns (5.186 ns .. 5.293 ns)
std dev 186.6 ps (152.8 ps .. 237.0 ps)
variance introduced by outliers: 60% (severely inflated)
benchmarking pdep-vector/popCount64
time 5.409 μs (5.344 μs .. 5.476 μs)
0.998 R² (0.997 R² .. 0.999 R²)
mean 5.458 μs (5.391 μs .. 5.549 μs)
std dev 258.1 ns (193.1 ns .. 374.6 ns)
variance introduced by outliers: 60% (severely inflated)
benchmarking pdep-vector/popCount32
time 3.849 μs (3.814 μs .. 3.885 μs)
0.999 R² (0.999 R² .. 0.999 R²)
mean 3.860 μs (3.818 μs .. 3.919 μs)
std dev 166.6 ns (131.0 ns .. 236.7 ns)
variance introduced by outliers: 56% (severely inflated)
benchmarking pdep-single/pdep64
time 10.46 ns (10.35 ns .. 10.58 ns)
0.999 R² (0.999 R² .. 0.999 R²)
mean 10.53 ns (10.42 ns .. 10.65 ns)
std dev 385.2 ps (321.8 ps .. 480.9 ps)
variance introduced by outliers: 60% (severely inflated)
benchmarking pdep-single/pdep32
time 10.57 ns (10.49 ns .. 10.68 ns)
0.999 R² (0.999 R² .. 1.000 R²)
mean 10.69 ns (10.57 ns .. 10.80 ns)
std dev 404.1 ps (341.0 ps .. 501.8 ps)
variance introduced by outliers: 62% (severely inflated)
benchmarking pdep-vector/pdep64
time 3.113 μs (3.091 μs .. 3.140 μs)
0.999 R² (0.999 R² .. 1.000 R²)
mean 3.130 μs (3.106 μs .. 3.158 μs)
std dev 82.60 ns (65.72 ns .. 103.1 ns)
variance introduced by outliers: 32% (moderately inflated)
benchmarking pdep-vector/pdep32
time 3.096 μs (3.070 μs .. 3.125 μs)
0.999 R² (0.999 R² .. 1.000 R²)
mean 3.086 μs (3.059 μs .. 3.114 μs)
std dev 91.30 ns (77.41 ns .. 110.3 ns)
variance introduced by outliers: 37% (moderately inflated)
benchmarking pext-single/pext64
time 73.06 ns (72.35 ns .. 73.84 ns)
0.999 R² (0.999 R² .. 1.000 R²)
mean 73.65 ns (73.02 ns .. 74.35 ns)
std dev 2.368 ns (1.974 ns .. 2.918 ns)
variance introduced by outliers: 50% (severely inflated)
benchmarking pext-single/pext32
time 74.07 ns (73.46 ns .. 74.62 ns)
0.999 R² (0.999 R² .. 1.000 R²)
mean 74.18 ns (73.41 ns .. 74.83 ns)
std dev 2.550 ns (2.161 ns .. 3.134 ns)
variance introduced by outliers: 54% (severely inflated)
benchmarking pext-vector/pext64
time 72.78 μs (71.21 μs .. 74.87 μs)
0.996 R² (0.993 R² .. 0.999 R²)
mean 72.17 μs (71.38 μs .. 73.41 μs)
std dev 3.105 μs (2.302 μs .. 4.741 μs)
variance introduced by outliers: 46% (moderately inflated)
benchmarking pext-vector/pext32
time 72.69 μs (72.04 μs .. 73.28 μs)
0.999 R² (0.999 R² .. 0.999 R²)
mean 73.00 μs (72.25 μs .. 73.85 μs)
std dev 2.634 μs (2.237 μs .. 3.137 μs)
variance introduced by outliers: 37% (moderately inflated)
benchmarking plus-vector/plus64
time 1.856 μs (1.833 μs .. 1.876 μs)
0.999 R² (0.999 R² .. 0.999 R²)
mean 1.861 μs (1.844 μs .. 1.880 μs)
std dev 63.39 ns (54.74 ns .. 75.83 ns)
variance introduced by outliers: 46% (moderately inflated)
benchmarking plus-vector/plus32
time 1.547 μs (1.532 μs .. 1.562 μs)
0.999 R² (0.998 R² .. 0.999 R²)
mean 1.554 μs (1.538 μs .. 1.578 μs)
std dev 62.25 ns (48.22 ns .. 91.40 ns)
variance introduced by outliers: 54% (severely inflated)
Benchmark bench: FINISH
Reference
Acknowledgement
A lot of people helped in the writing of this library and the underlying GHC patch:
- Ben Gamari
- Moritz Angermann
- Alex Mason
- Moritz Kiefer