Home

Awesome

The most commonly used binary search variant was first published by Hermann Bottenbruch in 1962 and hasn't notably changed since. Below I'll describe several novel variants with improved performance. The most notable variant, the monobound binary search, executes two to four times faster than the standard binary search on arrays smaller than 1 million 32 bit integers.

A source code implementation in C is available in the binary_search.c file which also contains a bench marking routine. A graph with performance results is included at the bottom of this page. Keep in mind performance will vary depending on hardware and compiler optimizations.

I'll briefly describe each variant and notable optimizations below, followed by some performance graphs.

Deferred Detection of Equality

By skipping the detection of equality until the binary search has finished (which does not allow for early termination) each loop contains 1 key check, 1 integer check and 2 integer assignments. This is pretty much the standard algorithm that has been used since 1962.

Pointer Optimizations

You can get another 10% performance boost by using pointer operations. I forgo such optimizations in the C implementation to keep things as readable as possible.

Unsigned Integer Optimization

You can get a further performance boost by using unsigned instead of signed integers.

Stability

All the implementations in binary_search.c should be stable. If you search an array containing the elements [1][4][7][7][7][9] and you search for the number 7, it should return the right most index. This is needed if you want to use a binary search in a stable sorting algorithm. The binary search being stable shouldn't notably slow down performance.

Zero length array

All the implementations in binary_search.c should correctly handle the case where the search function is called with 0 as the array length.

Compilation

For the monobound binary search variant to perform well the source code must be compiled with the -O1, -O2, or -O3 optimization flag.

Boundless Binary Search

The boundless binary search is faster than the standard binary search since the loop contains 1 key check, 1 integer check, and (on average) 1.5 integer assignments. The performance gain will vary depending on various factors, but should be around 20% when comparing 32 bit integers.

Doubletapped Binary Search

When you get to the end of a binary search and there are 2 elements left it takes exactly 2 if checks to finish. By doing two equality checks at the end you can finish up in either 1 or 2 if checks. Subsequently, on average, the doubletapped binary search performs slightly fewer key checks.

Monobound Binary Search

The monobound binary search is similar to the boundless binary search but uses an extra variable to simplify calculations and performs slightly more keychecks. It's up to 60% faster than the standard binary search when comparing 32 bit integers. On small arrays the performance difference is even greater.

The performance gain is due to dynamic loop unrolling, which the traditional binary search (by trying to minimize the number of key checks) does not allow. Loop unrolling in turn allows various other potential optimizations at the compiler and cpu level.

Tripletapped Binary Search

When you get to the end of a binary search and there are 3 elements left it takes 2.5 if checks to finish. The monobound binary search, however, takes 3 if checks. Subsequently the tripletapped variant performs 3 equality checks at the end with early termination, resulting in slightly fewer key checks and if the data aligns properly, slightly improved performance.

Quaternary Binary Search

The dynamic unrolling of loops is often limited to 16 iterations. By narrowing down the search range by a fourth each loop, instead of a half, it takes 16 iteriations to search 4294967296 elements, instead of 65536. This optimizations slows things down slightly for smaller arrays, but can give a notable gain on larger arrays.

Monobound Interpolated Binary Search

When you have an even distribution you can make an educated guess as to the location of the index. Due to the expense of the initial check and exponential search, the interpolated binary search is unlikely to outperform other binary searches on arrays with less than 1000 elements. When the distribution is uneven performance will drop, but not significantly.

A practical application for an interpolated binary search would be looking up authorization keys.

Adaptive Binary Search

The adaptive binary search is optimized for repeated binary searches on the same array. When it observes a pattern it switches from a binary search to an exponential search. Unlike the interpolated search the adaptive search works on uneven distributions as well.

A practical application for an adaptive binary search would be accessing a unicode lookup table.

Small array benchmark graph

The following benchmark was on WSL 2 gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04). The source code was compiled using gcc -O3 binary-search.c. Each test was ran 1,000 times with the time (in seconds) reported of the best run.

The graph below shows the execution speed on arrays with 1, 2, 4, 8, 16, 32, 64, and 128 elements on an Intel i3 quad-core processor.

binary search graph

<details><summary><b>data table</b></summary>
NameItemsHitsMissesChecksTime
linear_search18069194100000.000029
standard_binary_search18069194100000.000031
monobound_binary_search18069194100000.000033
linear_search210348966194950.000039
standard_binary_search210348966200000.000074
monobound_binary_search210348966200000.000036
linear_search47759225388620.000046
standard_binary_search47759225300000.000122
monobound_binary_search47759225300000.000041
linear_search88229178771330.000064
standard_binary_search88229178400000.000177
monobound_binary_search88229178400000.000050
linear_search16114188591511540.000116
standard_binary_search1611418859500000.000219
monobound_binary_search1611418859500000.000064
linear_search32114588553023240.000218
standard_binary_search3211458855600000.000270
monobound_binary_search3211458855600000.000074
linear_search64109689046052480.000409
standard_binary_search6410968904700000.000321
monobound_binary_search6410968904700000.000084
linear_search1281046895412141200.000749
standard_binary_search12810468954800000.000386
monobound_binary_search12810468954800000.000097
</details>

Large array benchmark graph

The following benchmark was on WSL 2 gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04). The source code was compiled using gcc -O3 binary-search.c. Each test was ran 10,000 times with the time (in seconds) reported of the best run.

The graph below shows the execution speed on arrays with 10, 100, 1000, 10000, 100000, and 1000000 elements on an Intel i3 quad-core processor.

binary search graph

<details><summary><b>data table</b></summary>
NameItemsHitsMissesChecksTime
standard_binary_search109109090436460.000182
boundless_binary_search109109090436460.000156
monobound_binary_search109109090500000.000060
monobound_interpolated_search109109090640270.000203
standard_binary_search10010478953770850.000361
boundless_binary_search10010478953770850.000292
monobound_binary_search10010478953800000.000096
monobound_interpolated_search10010478953924210.000234
standard_binary_search1000104189591098080.000610
boundless_binary_search1000104189591098080.000489
monobound_binary_search1000104189591100000.000137
monobound_interpolated_search1000104189591085090.000147
standard_binary_search10000102489761435800.000804
boundless_binary_search10000102489761435800.000651
monobound_binary_search10000102489761500000.000204
monobound_interpolated_search10000102489761093530.000202
standard_binary_search100000104089601768600.001087
boundless_binary_search100000104089601768600.000903
monobound_binary_search100000104089601800000.000360
monobound_interpolated_search100000104089601231440.000290
standard_binary_search100000099390072095290.001570
boundless_binary_search100000099390072095290.001369
monobound_binary_search100000099390072100000.000691
monobound_interpolated_search100000099390071248700.000374
</details>

monobound_bsearch() vs bsearch()

The following benchmark was on WSL 2 gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04). The source code was compiled using gcc -O3 monobound_bsearch.c. Each test was ran 1,000 times with the time (in seconds) reported of the best run.

The graph below shows the execution speed on arrays with 10, 100, 1K, 10K, 100K, 1M, and 10M elements on an Intel i3 quad-core processor. The bsearch function is the one provided by stdlib.h.

binary search graph

<details><summary><b>data table</b></summary>
NameItemsHitsMissesChecksTime
monobound109309070481490.000136
bsearch109309070346770.000202
monobound10011038897775390.000189
bsearch10011038897664700.000410
monobound1000103389671078450.000265
bsearch100010338967987030.000623
monobound10000103389671472320.000357
bsearch10000103389671323420.000820
monobound100000101489861775760.000539
bsearch100000101489861657850.001111
monobound100000099890022079380.001124
bsearch100000099890021984430.001603
monobound1000000097490262473240.002641
bsearch1000000097490262321740.003784
</details>