Home

Awesome

Origin

Blitsort is an in-place rotate quick/merge sort based on the stable out-of-place merge sort quadsort, stable out-of-place quicksort fluxsort, and unstable in-place crumsort. This page primarily focuses on the in-place stable aspects of blitsort, the quadsort and fluxsort pages go into greater detail about mergesort and stable quicksort respectively.

Visualization

In the visualization below the upper half shows the swap memory (32 elements) and the bottom half shows the main memory (512 elements). Colors are used to differentiate between various operations. Partitioning is in light and dark blue, rotations are in yellow and violet.

blitsort benchmark

The blitsort visualization only shows the sorting of random data, the sorting of ordered data is identical to the quadsort visualization on the quadsort github page.

Analyzer

Blitsort uses the same analyzer as fluxsort. The distribution is scanned to determine how much of it is already sorted. Distributions that are fully in order or in reverse-order are handled immediately. Segments that are mostly sorted are sorted with rotate quadsort, otherwise they're sorted with rotate fluxsort.

Rotate merge sort

A rotate merge sort uses rotations to partition two sorted arrays until they're small enough to be merged using auxiliary memory. Blitsort does so by taking the center element of the first array, using a binary search to find all elements smaller than the center element in the second array, and performing an array rotation. It does so recursively until a partition becomes small enough to be merged.

Rotate stable quicksort

A stable rotate quicksort partitions the array in segments using auxiliary memory, the left and right partition of each segment is then moved to the proper location recursively using rotations. On random data it's faster than a rotate mergesort since no binary search is required and because the branchless optimizations are slightly more efficient.

Monobound binary search

Blitsort uses a monobound binary search, which is up to two times faster than the binary search in general use.

Trinity rotation

Blitsort uses a trinity rotation, a new and significantly faster array rotation algorithm.

Memory

By default blitsort uses 512 elements worth of stack memory.

The minimum memory requirement for blitsort is 32 elements of stack memory, it can be configured to use sqrt(n) memory.

Blitsort partitions recursively, requiring an additional log(n) memory. It's possible to make this O(1) through the implementation of a stack, which makes the rotate quick/merge sort algorithm in-place from a theoretical perspective.

There is currently no clear consensus on what constitutes as an in-place sort, it boils down to what someone considers a small enough memory footprint to be considered negligable. This typically ranges from the size of a cache line to the size of the L1 cache.

Performance

Blitsort has exceptional performance due to the use of trinity / bridge rotations. It is likely the fastest in-place stable sort written so far and is about 15-50% faster than octosort, which is a block merge sort.

Blitsort's performance is similar to that of fluxsort as long as it has sqrt(n) auxiliary memory. Performance on larger arrays degrades steadily but will still beat most traditional sorts at 10 million elements.

To take full advantage of branchless operations the cmp macro needs to be uncommented in bench.c, which will double the performance on primitive types. The blitsort_prim function can be used to access primitive comparisons directly for 32 and 64 bit integers.

Data Types

Blitsort supports long doubles and 8, 16, 32, and 64 bit data types. By using 32 or 64 bit pointers it's possible to sort any other data type, like strings. Custom data sizes can be added in blitsort.h. For any practical use where stability matters you'll likely be using pointers however.

Interface

The interface is the same one as qsort, which is described in man qsort.

Big O

                 ┌───────────────────────┐┌───────────────────────┐
                 │comparisons            ││swap memory            │
┌───────────────┐├───────┬───────┬───────┤├───────┬───────┬───────┤┌──────┐┌─────────┐┌─────────┐
│name           ││min    │avg    │max    ││min    │avg    │max    ││stable││partition││adaptive │
├───────────────┤├───────┼───────┼───────┤├───────┼───────┼───────┤├──────┤├─────────┤├─────────┤
│blitsort       ││n      │n log n│n log n││1      │1      │1      ││yes   ││yes      ││yes      │
├───────────────┤├───────┼───────┼───────┤├───────┼───────┼───────┤├──────┤├─────────┤├─────────┤
│crumsort       ││n      │n log n│n log n││1      │1      │1      ││no    ││yes      ││yes      │
├───────────────┤├───────┼───────┼───────┤├───────┼───────┼───────┤├──────┤├─────────┤├─────────┤
│fluxsort       ││n      │n log n│n log n││1      │n      │n      ││yes   ││yes      ││yes      │
├───────────────┤├───────┼───────┼───────┤├───────┼───────┼───────┤├──────┤├─────────┤├─────────┤
│mergesort      ││n log n│n log n│n log n││n      │n      │n      ││yes   ││no       ││no       │
├───────────────┤├───────┼───────┼───────┤├───────┼───────┼───────┤├──────┤├─────────┤├─────────┤
│quadsort       ││n      │n log n│n log n││1      │n      │n      ││yes   ││no       ││yes      │
├───────────────┤├───────┼───────┼───────┤├───────┼───────┼───────┤├──────┤├─────────┤├─────────┤
│quicksort      ││n      │n log n│n²     ││1      │1      │1      ││no    ││yes      ││no       │
└───────────────┘└───────┴───────┴───────┘└───────┴───────┴───────┘└──────┘└─────────┘└─────────┘

Benchmark: blitsort vs std::stable_sort vs gfx::timsort

The following benchmark was on WSL 2 gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04) using the wolfsort benchmark. The source code was compiled using g++ -O3 -w -fpermissive bench.c. Stablesort is g++'s std:stable_sort function.

The graph shows the relative performance on 100,000 32 bit integers. A table with the best and average time in seconds can be uncollapsed below the bar graph.

Graph

<details><summary><b>data table</b></summary>
NameItemsTypeBestAverageComparesSamplesDistribution
stablesort1000001280.0109130.0110470100random order
blitsort1000001280.0108800.0114230100random order
timsort1000001280.0127910.0129010100random order
NameItemsTypeBestAverageComparesSamplesDistribution
stablesort100000640.0061250.0062070100random order
blitsort100000640.0023970.0025180100random order
timsort100000640.0077870.0078550100random order
NameItemsTypeBestAverageLoopsSamplesDistribution
stablesort100000320.0059910.0060250100random order
blitsort100000320.0022450.0023410100random order
timsort100000320.0076200.0076560100random order
stablesort100000320.0038150.0038380100random % 100
blitsort100000320.0010170.0010850100random % 100
timsort100000320.0056410.0056740100random % 100
stablesort100000320.0006730.0007030100ascending order
blitsort100000320.0000440.0000440100ascending order
timsort100000320.0000450.0000450100ascending order
stablesort100000320.0013630.0013990100ascending saw
blitsort100000320.0006280.0006360100ascending saw
timsort100000320.0008440.0008490100ascending saw
stablesort100000320.0008100.0008400100pipe organ
blitsort100000320.0003580.0003630100pipe organ
timsort100000320.0001690.0001720100pipe organ
stablesort100000320.0009050.0009210100descending order
blitsort100000320.0000550.0000550100descending order
timsort100000320.0000880.0000920100descending order
stablesort100000320.0013610.0014020100descending saw
blitsort100000320.0006370.0006410100descending saw
timsort100000320.0008420.0008500100descending saw
stablesort100000320.0020330.0020900100random tail
blitsort100000320.0009560.0009610100random tail
timsort100000320.0019990.0020190100random tail
stablesort100000320.0034930.0035410100random half
blitsort100000320.0013960.0014040100random half
timsort100000320.0040330.0040590100random half
stablesort100000320.0009210.0009480100ascending tiles
blitsort100000320.0013830.0013970100ascending tiles
timsort100000320.0008360.0008800100ascending tiles
stablesort100000320.0011710.0013550100bit reversal
blitsort100000320.0020520.0021600100bit reversal
timsort100000320.0022500.0028320100bit reversal
</details>

The following benchmark was on WSL 2 gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04) using the wolfsort benchmark. The source code was compiled using g++ -O3 -w -fpermissive bench.c. It measures the performance on random data with array sizes ranging from 10 to 10,000,000. It's generated by running the benchmark using 1024 0 0 as the argument. The benchmark is weighted, meaning the number of repetitions halves each time the number of items doubles. A table with the best and average time in seconds can be uncollapsed below the bar graph.

Graph

<details><summary><b>data table</b></summary>
NameItemsTypeBestAverageComparesSamplesDistribution
stablesort10320.1262630.1264980.010random 10
blitsort10320.0501690.0509640.010random 10
timsort10320.1404030.1422170.010random 10
stablesort100320.2453180.2455920.010random 100
blitsort100320.1132150.1137170.010random 100
timsort100320.3442900.3449270.010random 100
stablesort1000320.3629810.3635140.010random 1000
blitsort1000320.1411620.1416630.010random 1000
timsort1000320.4977170.4978710.010random 1000
stablesort10000320.4767460.4774060.010random 10000
blitsort10000320.1758330.1761540.010random 10000
timsort10000320.6396700.6402210.010random 10000
stablesort100000320.6017400.6022950.010random 100000
blitsort100000320.2336710.2345030.010random 100000
timsort100000320.7657750.7663470.010random 100000
stablesort1000000320.7304290.7321320.010random 1000000
blitsort1000000320.3136030.3160550.010random 1000000
timsort1000000320.8986290.9007610.010random 1000000
stablesort10000000320.8898120.894025010random 10000000
blitsort10000000320.4253430.433828010random 10000000
timsort10000000321.0831891.089713010random 10000000
</details>

Benchmark: blitsort vs qsort (mergesort) vs quadsort

The following benchmark was on WSL 2 gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04). The source code was compiled using gcc -O3 bench.c. The graph shows the relative performance on 100,000 32 bit integers. A table with the best and average time in seconds can be uncollapsed below the bar graph.

Graph

<details><summary><b>data table</b></summary>
NameItemsTypeBestAverageComparesSamplesDistribution
qsort100000640.0166790.0168741536381100random string
blitsort100000640.0106010.0109051782500100random string
quadsort100000640.0107590.0109261684673100random string
qsort100000640.0145110.0146681536491100random double
blitsort100000640.0080660.0082971781674100random double
quadsort100000640.0085690.0087131684633100random double
qsort100000640.0110170.0111791536491100random long
blitsort100000640.0054320.0056031781674100random long
quadsort100000640.0060170.0061091684633100random long
qsort100000640.0107130.0108701536634100random int
blitsort100000640.0052270.0053551790104100random int
quadsort100000640.0053680.0054091684734100random int
NameItemsTypeBestAverageComparesSamplesDistribution
qsort1000001280.0190640.0199201536491100random order
blitsort1000001280.0144910.0149051781674100random order
quadsort1000001280.0118360.0119201684633100random order
NameItemsTypeBestAverageComparesSamplesDistribution
qsort100000640.0094030.0095211536491100random order
blitsort100000640.0045970.0047871781674100random order
quadsort100000640.0040970.0041351684633100random order
NameItemsTypeBestAverageComparesSamplesDistribution
qsort100000320.0089480.0090541536634100random order
blitsort100000320.0038670.0040061790104100random order
quadsort100000320.0033650.0034121684734100random order
qsort100000320.0068900.0070881532465100random % 100
blitsort100000320.0017460.001816897246100random % 100
quadsort100000320.0027440.0028251415417100random % 100
qsort100000320.0026100.002730815024100ascending order
blitsort100000320.0001650.00016699999100ascending order
quadsort100000320.0001590.00016199999100ascending order
qsort100000320.0033960.003577915020100ascending saw
blitsort100000320.0008860.000896322647100ascending saw
quadsort100000320.0008910.000905379624100ascending saw
qsort100000320.0026630.002709884462100pipe organ
blitsort100000320.0005500.000555234795100pipe organ
quadsort100000320.0004610.000464277113100pipe organ
qsort100000320.0024560.002586853904100descending order
blitsort100000320.0001760.00017999999100descending order
quadsort100000320.0001630.00018099999100descending order
qsort100000320.0032870.003400953892100descending saw
blitsort100000320.0009080.000922322881100descending saw
quadsort100000320.0009130.000924391547100descending saw
qsort100000320.0042320.0043401012073100random tail
blitsort100000320.0015140.001528613378100random tail
quadsort100000320.0011900.001209592061100random tail
qsort100000320.0059630.0061631200713100random half
blitsort100000320.0023310.0023631040201100random half
quadsort100000320.0020460.0020661006728100random half
qsort100000320.0043590.0046241209200100ascending tiles
blitsort100000320.0013300.001353371638100ascending tiles
quadsort100000320.0021790.002243671244100ascending tiles
qsort100000320.0059820.0063761553378100bit reversal
blitsort100000320.0036490.0038251795037100bit reversal
quadsort100000320.0031800.0032531727134100bit reversal
</details>

Benchmark: blitsort vs pdqsort vs crumsort

The following benchmark was on WSL gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04) using the wolfsort benchmark. The source code was compiled using g++ -O3 -w -fpermissive bench.c. The bar graph shows the best run out of 100 on 100,000 32 bit integers. Comparisons for pdqsort, blitsort, and crumsort are inlined.

Graph

<details><summary><b>data table</b></summary>
NameItemsTypeBestAverageComparesSamplesDistribution
pdqsort1000001280.0059530.0061050100random order
blitsort1000001280.0109870.0116490100random order
crumsort1000001280.0082770.0084990100random order
NameItemsTypeBestAverageComparesSamplesDistribution
pdqsort100000640.0026460.0026940100random order
blitsort100000640.0023820.0025150100random order
crumsort100000640.0018940.0019040100random order
NameItemsTypeBestAverageLoopsSamplesDistribution
pdqsort100000320.0026830.0027200100random order
blitsort100000320.0022490.0023770100random order
crumsort100000320.0017950.0018250100random order
pdqsort100000320.0008020.0008070100random % 100
blitsort100000320.0010180.0010780100random % 100
crumsort100000320.0005600.0005630100random % 100
pdqsort100000320.0000990.0000990100ascending order
blitsort100000320.0000440.0000440100ascending order
crumsort100000320.0000440.0000440100ascending order
pdqsort100000320.0034590.0035010100ascending saw
blitsort100000320.0006270.0006320100ascending saw
crumsort100000320.0006280.0006330100ascending saw
pdqsort100000320.0028270.0028630100pipe organ
blitsort100000320.0003580.0003620100pipe organ
crumsort100000320.0003580.0003600100pipe organ
pdqsort100000320.0002030.0002040100descending order
blitsort100000320.0000550.0000550100descending order
crumsort100000320.0000560.0000560100descending order
pdqsort100000320.0032320.0032620100descending saw
blitsort100000320.0006370.0006410100descending saw
crumsort100000320.0006370.0006460100descending saw
pdqsort100000320.0025590.0025790100random tail
blitsort100000320.0009550.0009590100random tail
crumsort100000320.0008800.0008860100random tail
pdqsort100000320.0026630.0026790100random half
blitsort100000320.0013960.0014070100random half
crumsort100000320.0011990.0012110100random half
pdqsort100000320.0023090.0023440100ascending tiles
blitsort100000320.0013840.0014030100ascending tiles
crumsort100000320.0015170.0015310100ascending tiles
pdqsort100000320.0026620.0026850100bit reversal
blitsort100000320.0020520.0021750100bit reversal
crumsort100000320.0017870.0018050100bit reversal
</details>