Awesome
The most commonly used binary search variant was first published by Hermann Bottenbruch in 1962 and hasn't notably changed since. Below I'll describe several novel variants with improved performance. The most notable variant, the monobound binary search, executes two to four times faster than the standard binary search on arrays smaller than 1 million 32 bit integers.
A source code implementation in C is available in the binary_search.c file which also contains a bench marking routine. A graph with performance results is included at the bottom of this page. Keep in mind performance will vary depending on hardware and compiler optimizations.
I'll briefly describe each variant and notable optimizations below, followed by some performance graphs.
Deferred Detection of Equality
By skipping the detection of equality until the binary search has finished (which does not allow for early termination) each loop contains 1 key check, 1 integer check and 2 integer assignments. This is pretty much the standard algorithm that has been used since 1962.
Pointer Optimizations
You can get another 10% performance boost by using pointer operations. I forgo such optimizations in the C implementation to keep things as readable as possible.
Unsigned Integer Optimization
You can get a further performance boost by using unsigned instead of signed integers.
Stability
All the implementations in binary_search.c should be stable. If you search an array containing the elements [1][4][7][7][7][9]
and you search for the number 7
, it should return the right most index. This is needed if you want to use a binary search in a stable sorting algorithm. The binary search being stable shouldn't notably slow down performance.
Zero length array
All the implementations in binary_search.c should correctly handle the case where the search function is called with 0 as the array length.
Compilation
For the monobound binary search variant to perform well the source code must be compiled with the -O1, -O2, or -O3 optimization flag.
Boundless Binary Search
The boundless binary search is faster than the standard binary search since the loop contains 1 key check, 1 integer check, and (on average) 1.5 integer assignments. The performance gain will vary depending on various factors, but should be around 20% when comparing 32 bit integers.
Doubletapped Binary Search
When you get to the end of a binary search and there are 2 elements left it takes exactly 2 if checks to finish. By doing two equality checks at the end you can finish up in either 1 or 2 if checks. Subsequently, on average, the doubletapped binary search performs slightly fewer key checks.
Monobound Binary Search
The monobound binary search is similar to the boundless binary search but uses an extra variable to simplify calculations and performs slightly more keychecks. It's up to 60% faster than the standard binary search when comparing 32 bit integers. On small arrays the performance difference is even greater.
The performance gain is due to dynamic loop unrolling, which the traditional binary search (by trying to minimize the number of key checks) does not allow. Loop unrolling in turn allows various other potential optimizations at the compiler and cpu level.
Tripletapped Binary Search
When you get to the end of a binary search and there are 3 elements left it takes 2.5 if checks to finish. The monobound binary search, however, takes 3 if checks. Subsequently the tripletapped variant performs 3 equality checks at the end with early termination, resulting in slightly fewer key checks and if the data aligns properly, slightly improved performance.
Quaternary Binary Search
The dynamic unrolling of loops is often limited to 16 iterations. By narrowing down the search range by a fourth each loop, instead of a half, it takes 16 iteriations to search 4294967296 elements, instead of 65536. This optimizations slows things down slightly for smaller arrays, but can give a notable gain on larger arrays.
Monobound Interpolated Binary Search
When you have an even distribution you can make an educated guess as to the location of the index. Due to the expense of the initial check and exponential search, the interpolated binary search is unlikely to outperform other binary searches on arrays with less than 1000 elements. When the distribution is uneven performance will drop, but not significantly.
A practical application for an interpolated binary search would be looking up authorization keys.
Adaptive Binary Search
The adaptive binary search is optimized for repeated binary searches on the same array. When it observes a pattern it switches from a binary search to an exponential search. Unlike the interpolated search the adaptive search works on uneven distributions as well.
A practical application for an adaptive binary search would be accessing a unicode lookup table.
Small array benchmark graph
The following benchmark was on WSL 2 gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04). The source code was compiled using gcc -O3 binary-search.c
. Each test was ran 1,000 times with the time (in seconds) reported of the best run.
The graph below shows the execution speed on arrays with 1, 2, 4, 8, 16, 32, 64, and 128 elements on an Intel i3 quad-core processor.
<details><summary><b>data table</b></summary>Name | Items | Hits | Misses | Checks | Time |
---|---|---|---|---|---|
linear_search | 1 | 806 | 9194 | 10000 | 0.000029 |
standard_binary_search | 1 | 806 | 9194 | 10000 | 0.000031 |
monobound_binary_search | 1 | 806 | 9194 | 10000 | 0.000033 |
linear_search | 2 | 1034 | 8966 | 19495 | 0.000039 |
standard_binary_search | 2 | 1034 | 8966 | 20000 | 0.000074 |
monobound_binary_search | 2 | 1034 | 8966 | 20000 | 0.000036 |
linear_search | 4 | 775 | 9225 | 38862 | 0.000046 |
standard_binary_search | 4 | 775 | 9225 | 30000 | 0.000122 |
monobound_binary_search | 4 | 775 | 9225 | 30000 | 0.000041 |
linear_search | 8 | 822 | 9178 | 77133 | 0.000064 |
standard_binary_search | 8 | 822 | 9178 | 40000 | 0.000177 |
monobound_binary_search | 8 | 822 | 9178 | 40000 | 0.000050 |
linear_search | 16 | 1141 | 8859 | 151154 | 0.000116 |
standard_binary_search | 16 | 1141 | 8859 | 50000 | 0.000219 |
monobound_binary_search | 16 | 1141 | 8859 | 50000 | 0.000064 |
linear_search | 32 | 1145 | 8855 | 302324 | 0.000218 |
standard_binary_search | 32 | 1145 | 8855 | 60000 | 0.000270 |
monobound_binary_search | 32 | 1145 | 8855 | 60000 | 0.000074 |
linear_search | 64 | 1096 | 8904 | 605248 | 0.000409 |
standard_binary_search | 64 | 1096 | 8904 | 70000 | 0.000321 |
monobound_binary_search | 64 | 1096 | 8904 | 70000 | 0.000084 |
linear_search | 128 | 1046 | 8954 | 1214120 | 0.000749 |
standard_binary_search | 128 | 1046 | 8954 | 80000 | 0.000386 |
monobound_binary_search | 128 | 1046 | 8954 | 80000 | 0.000097 |
Large array benchmark graph
The following benchmark was on WSL 2 gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04). The source code was compiled using gcc -O3 binary-search.c
. Each test was ran 10,000 times with the time (in seconds) reported of the best run.
The graph below shows the execution speed on arrays with 10, 100, 1000, 10000, 100000, and 1000000 elements on an Intel i3 quad-core processor.
<details><summary><b>data table</b></summary>Name | Items | Hits | Misses | Checks | Time |
---|---|---|---|---|---|
standard_binary_search | 10 | 910 | 9090 | 43646 | 0.000182 |
boundless_binary_search | 10 | 910 | 9090 | 43646 | 0.000156 |
monobound_binary_search | 10 | 910 | 9090 | 50000 | 0.000060 |
monobound_interpolated_search | 10 | 910 | 9090 | 64027 | 0.000203 |
standard_binary_search | 100 | 1047 | 8953 | 77085 | 0.000361 |
boundless_binary_search | 100 | 1047 | 8953 | 77085 | 0.000292 |
monobound_binary_search | 100 | 1047 | 8953 | 80000 | 0.000096 |
monobound_interpolated_search | 100 | 1047 | 8953 | 92421 | 0.000234 |
standard_binary_search | 1000 | 1041 | 8959 | 109808 | 0.000610 |
boundless_binary_search | 1000 | 1041 | 8959 | 109808 | 0.000489 |
monobound_binary_search | 1000 | 1041 | 8959 | 110000 | 0.000137 |
monobound_interpolated_search | 1000 | 1041 | 8959 | 108509 | 0.000147 |
standard_binary_search | 10000 | 1024 | 8976 | 143580 | 0.000804 |
boundless_binary_search | 10000 | 1024 | 8976 | 143580 | 0.000651 |
monobound_binary_search | 10000 | 1024 | 8976 | 150000 | 0.000204 |
monobound_interpolated_search | 10000 | 1024 | 8976 | 109353 | 0.000202 |
standard_binary_search | 100000 | 1040 | 8960 | 176860 | 0.001087 |
boundless_binary_search | 100000 | 1040 | 8960 | 176860 | 0.000903 |
monobound_binary_search | 100000 | 1040 | 8960 | 180000 | 0.000360 |
monobound_interpolated_search | 100000 | 1040 | 8960 | 123144 | 0.000290 |
standard_binary_search | 1000000 | 993 | 9007 | 209529 | 0.001570 |
boundless_binary_search | 1000000 | 993 | 9007 | 209529 | 0.001369 |
monobound_binary_search | 1000000 | 993 | 9007 | 210000 | 0.000691 |
monobound_interpolated_search | 1000000 | 993 | 9007 | 124870 | 0.000374 |
monobound_bsearch() vs bsearch()
The following benchmark was on WSL 2 gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04). The source code was compiled using gcc -O3 monobound_bsearch.c
. Each test was ran 1,000 times with the time (in seconds) reported of the best run.
The graph below shows the execution speed on arrays with 10, 100, 1K, 10K, 100K, 1M, and 10M elements on an Intel i3 quad-core processor. The bsearch function is the one provided by stdlib.h.
<details><summary><b>data table</b></summary>Name | Items | Hits | Misses | Checks | Time |
---|---|---|---|---|---|
monobound | 10 | 930 | 9070 | 48149 | 0.000136 |
bsearch | 10 | 930 | 9070 | 34677 | 0.000202 |
monobound | 100 | 1103 | 8897 | 77539 | 0.000189 |
bsearch | 100 | 1103 | 8897 | 66470 | 0.000410 |
monobound | 1000 | 1033 | 8967 | 107845 | 0.000265 |
bsearch | 1000 | 1033 | 8967 | 98703 | 0.000623 |
monobound | 10000 | 1033 | 8967 | 147232 | 0.000357 |
bsearch | 10000 | 1033 | 8967 | 132342 | 0.000820 |
monobound | 100000 | 1014 | 8986 | 177576 | 0.000539 |
bsearch | 100000 | 1014 | 8986 | 165785 | 0.001111 |
monobound | 1000000 | 998 | 9002 | 207938 | 0.001124 |
bsearch | 1000000 | 998 | 9002 | 198443 | 0.001603 |
monobound | 10000000 | 974 | 9026 | 247324 | 0.002641 |
bsearch | 10000000 | 974 | 9026 | 232174 | 0.003784 |