Home

Awesome

Intro

This document describes a stable bottom-up adaptive branchless merge sort named quadsort. A visualisation and benchmarks are available at the bottom.

The quad swap analyzer

Quadsort starts out with an analyzer that has the following tasks:

  1. Detect ordered data with minimal comparisons.
  2. Detect reverse order data with minimal comparisons.
  3. Do the above without impacting performance on random data.
  4. Exit the quad swap analyzer with sorted blocks of 8 elements.

Ordered data handling

Quadsort's analyzer examines the array 8 elements at a time. It performs 4 comparisons on elements (1,2), (3,4), (5,6), and (7,8) of which the result is stored and a bitmask is created with a value between 0 and 15 for all the possible combinations. If the result is 0 it means the 4 comparisons were all in order.

What remains is 3 more comparisons on elements (2,3), (4,5), and (6,7) to determine if all 8 elements are in order. Traditional sorts would do this with 7 branched individual comparisons, which should result in 3.2 branch mispredictions on random data on average. Using quadsort's method should result in 0.2 branch mispredictions on random data on average.

If the data is in order quadsort moves on to the next 8 elements. If the data turns out to be neither in order or in reverse order, 4 branchless swaps are performed using the stored comparison results, followed by a branchless parity merge. More on that later.

Reverse order handling

Reverse order data is typically moved using a simple reversal function, as following.

int reverse(int array[], int start, int end, int swap)
{
    while (start < end)
    {
        swap = array[start];
        array[start++] = array[end];
        array[end--] = swap;
    }
}

While random data can only be sorted using n log n comparisons and n log n moves, reverse-order data can be sorted using n comparisons and n moves through run detection and a reversal routine. Without run detection the best you can do is sort it in n comparisons and n log n moves.

Run detection, as the name implies, comes with a detection cost. Thanks to the laws of probability a quad swap can cheat however. The chance of 4 separate pairs of elements being in reverse order is 1 in 16. So there's a 6.25% chance quadsort makes a wasteful run check.

What about run detection for in-order data? While we're turning n log n moves into n moves with reverse order run detection, we'd be turning 0 moves into 0 moves with forward run detection. So there's no point in doing so.

The next optimization is to write the quad swap analyzer in such a way that we can perform a simple check to see if the entire array was in reverse order, if so, the sort is finished.

At the end of the loop the array has been turned into a series of ordered blocks of 8 elements.

Ping-Pong Quad Merge

Most textbook mergesort examples merge two blocks to swap memory, then copy them back to main memory.

main memory ┌────────┐┌────────┐
            └────────┘└────────┘
                  ↓ merge ↓
swap memory ┌──────────────────┐
            └──────────────────┘
                  ↓ copy ↓
main memory ┌──────────────────┐
            └──────────────────┘

This doubles the amount of moves and we can fix this by merging 4 blocks at once using a quad merge / ping-pong merge like so:

main memory ┌────────┐┌────────┐┌────────┐┌────────┐
            └────────┘└────────┘└────────┘└────────┘
                  ↓ merge ↓           ↓ merge ↓
swap memory ┌──────────────────┐┌──────────────────┐
            └──────────────────┘└──────────────────┘
                            ↓ merge ↓
main memory ┌──────────────────────────────────────┐
            └──────────────────────────────────────┘

It is possible to interleave the two merges to swap memory for increased memory-level parallelism, but this can both increase and decrease performance.

Skipping

Just like with the quad swap it is beneficial to check whether the 4 blocks are in-order.

In the case of the 4 blocks being in-order the merge operation is skipped, as this would be pointless. Because reverse order data is handled in the quad swap there is no need to check for reverse order blocks.

This allows quadsort to sort in-order sequences using n comparisons instead of n * log n comparisons.

Parity merge

A parity merge takes advantage of the fact that if you have two n length arrays, you can fully merge the two arrays by performing n merge operations on the start of each array, and n merge operations on the end of each array. The arrays must be of equal length. Another way to describe the parity merge would be as a bidirectional unguarded merge.

The main advantage of a parity merge over a traditional merge is that the loop of a parity merge can be fully unrolled.

If the arrays are not of equal length a hybrid parity merge can be performed. One way to do so is using n parity merges where n is the size of the smaller array, before switching to a traditional merge.

Branchless parity merge

Since the parity merge can be unrolled it's very suitable for branchless optimizations to speed up the sorting of random data. Another advantage is that two separate memory regions are accessed in the same loop, allowing memory-level parallelism. This makes the routine up to 2.5 times faster for random data on most hardware.

Increasing the memory regions from two to four can result in both performance gains and performance losses.

The following is a visualization of an array with 256 random elements getting turned into sorted blocks of 32 elements using ping-pong parity merges.

quadsort visualization

Cross merge

While a branchless parity merge sorts random data faster, it sorts ordered data slower. One way to solve this problem is by using a method with a resemblance to the galloping merge concept first introduced by timsort.

The cross merge works in a similar way to the quad swap. Instead of merging two arrays two items at a time, it merges four items at a time.

┌───┐┌───┐┌───┐    ┌───┐┌───┐┌───┐            ╭───╮  ┌───┐┌───┐┌───┐
│ A ││ B ││ C │    │ X ││ Y ││ Z │        ┌───│B<X├──┤ A ││ B ││C/X│
└─┬─┘└─┬─┘└───┘    └─┬─┘└─┬─┘└───┘        │   ╰─┬─╯  └───┘└───┘└───┘
  └────┴─────────────┴────┴───────────────┘     │  ╭───╮  ┌───┐┌───┐┌───┐
                                                └──│A>Y├──┤ X ││ Y ││A/Z│
                                                   ╰─┬─╯  └───┘└───┘└───┘
                                                     │    ┌───┐┌───┐┌───┐
                                                     └────│A/X││X/A││B/Y│
                                                          └───┘└───┘└───┘

When merging ABC and XYZ it first checks if B is smaller or equal to X. If that's the case A and B are copied to swap. If not, it checks if A is greater than Y. If that's the case X and Y are copied to swap.

If either check is false it's known that the two remaining distributions are A X and X A. This allows performing an optimal branchless merge. Since it's known each block still has at least 1 item remaining (B and Y) and there is a high probability of the data to be random, another (sub-optimal) branchless merge can be performed.

In C this looks as following:

while (l < l_size - 1 && r < r_size - 1)
{
    if (left[l + 1] <= right[r])
    {
        swap[s++] = left[l++];
        swap[s++] = left[l++];
    }
    else if (left[l] > right[r + 1])
    {
        swap[s++] = right[r++];
        swap[s++] = right[r++];
    }
    else
    {
        a = left[l] > right[r];
        x = !a;
        swap[s + a] = left[l++];
        swap[s + x] = right[r++];
        s += 2;
    }
}

Overall the cross merge gives a decent performance gain for both ordered and random data, particularly when the two arrays are of unequal length. When two arrays are of near equal length quadsort looks 8 elements ahead, and performs an 8 element parity merge if it can't skip ahead.

Merge strategy

Quadsort will merge blocks of 8 into blocks of 32, which it will merge into blocks of 128, 512, 2048, 8192, etc.

For each ping-pong merge quadsort will perform two comparisons to see if it will be faster to use a parity merge or a cross merge, and pick the best option.

Tail merge

When sorting an array of 33 elements you end up with a sorted array of 32 elements and a sorted array of 1 element in the end. If a program sorts in intervals it should pick an optimal array size (32, 128, 512, etc) to do so.

To minimalize the impact the remainder of the array is sorted using a tail merge.

Big O

                 ┌───────────────────────┐┌───────────────────────┐
                 │comparisons            ││swap memory            │
┌───────────────┐├───────┬───────┬───────┤├───────┬───────┬───────┤┌──────┐┌─────────┐┌─────────┐
│name           ││min    │avg    │max    ││min    │avg    │max    ││stable││partition││adaptive │
├───────────────┤├───────┼───────┼───────┤├───────┼───────┼───────┤├──────┤├─────────┤├─────────┤
│mergesort      ││n log n│n log n│n log n││n      │n      │n      ││yes   ││no       ││no       │
├───────────────┤├───────┼───────┼───────┤├───────┼───────┼───────┤├──────┤├─────────┤├─────────┤
│quadsort       ││n      │n log n│n log n││1      │n      │n      ││yes   ││no       ││yes      │
├───────────────┤├───────┼───────┼───────┤├───────┼───────┼───────┤├──────┤├─────────┤├─────────┤
│quicksort      ││n log n│n log n│n²     ││1      │1      │1      ││no    ││yes      ││no       │
└───────────────┘└───────┴───────┴───────┘└───────┴───────┴───────┘└──────┘└─────────┘└─────────┘

Quadsort makes n comparisons when the data is fully sorted or reverse sorted.

Data Types

The C implementation of quadsort supports long doubles and 8, 16, 32, and 64 bit data types. By using pointers it's possible to sort any other data type, like strings.

Interface

Quadsort uses the same interface as qsort, which is described in man qsort.

In addition to supporting (l - r) and ((l > r) - (l < r)) for the comparison function, (l > r) is valid as well. Special note should be taken that C++ sorts use (l < r) for the comparison function, which is incompatible with the C standard. When porting quadsort to C++ or Rust, switch (l, r) to (r, l) for every comparison.

Quadsort comes with the quadsort_prim(void *array, size_t nmemb, size_t size) function to perform primitive comparisons on arrays of 32 and 64 bit integers. Nmemb is the number of elements, while size should be either sizeof(int) or sizeof(long long) for signed integers, and sizeof(int) + 1 or sizeof(long long) + 1 for unsigned integers. Support for the char, short, float, double, and long double types can be easily added in quadsort.h.

Quadsort comes with the quadsort_size(void *array, size_t nmemb, size_t size, CMPFUNC *cmp) function to sort elements of any given size. The comparison function needs to be by reference, instead of by value, as if you are sorting an array of pointers.

Memory

By default quadsort uses n swap memory. If memory allocation fails quadsort will switch to sorting in-place through rotations. The minimum memory requirement is 32 elements of stack memory.

Rotations can be performed with minimal performance loss by using branchless binary searches and trinity / bridge rotations.

Sorting in-place through rotations will increase the number of moves from n log n to n log² n. The overall impact on performance is minor on array sizes below 1M elements.

Performance

Quadsort is one of the fastest merge sorts written to date. It is faster than quicksort for most data distributions, with the notable exception of generic data. Data type is important as well, and overall quadsort is faster for sorting referenced objects.

Compared to Timsort, Quadsort has similar overall adaptivity while being much faster on random data, even without branchless optimizations.

Quicksort has a slight advantage on random data as the array size increases beyond the L1 cache. For small arrays quadsort has a significant advantage due to quicksort's inability to cost effectively pick a reliable pivot. Subsequently, the only way for quicksort to rival quadsort is to cheat and become a hybrid sort, by using branchless merges to sort small partitions.

When using the clang compiler quadsort can use branchless ternary comparisons. Since most programming languages only support ternary merges ? : and not ternary partitions : ? this gives branchless mergesorts an additional advantage over branchless quicksorts. However, since the gcc compiler doesn't support branchless ternary merges, and the hack to perform branchless merges is less efficient than the hack to perform branchless partitions, branchless quicksorts have an advantage for gcc.

To take full advantage of branchless operations the cmp macro needs to be uncommented in bench.c, which will increase the performance by 30% on primitive types. The quadsort_prim function can be used to access primitive comparisons directly.

Variants

Credits

I personally invented the quad swap analyzer, cross merge, parity merge, branchless parity merge, monobound binary search, bridge rotation, and trinity rotation.

The ping-pong quad merge had been independently implemented in wikisort prior to quadsort, and likely by others as well.

The monobound binary search has been independently implemented, often referred to as a branchless binary search. I published a working concept in 2014, which appears to pre-date others.

Special kudos to Musiccombo and Co for getting me interested in rotations and branchless logic.

Visualization

In the visualization below nine tests are performed on 256 elements.

  1. Random order
  2. Ascending order
  3. Ascending Saw
  4. Generic random order
  5. Descending order
  6. Descending Saw
  7. Random tail
  8. Random half
  9. Ascending tiles.

The upper half shows the swap memory and the bottom half shows the main memory. Colors are used to differentiate various operations. Quad swaps are in cyan, reversals in magenta, skips in green, parity merges in orange, bridge rotations in yellow, and trinity rotations are in violet.

quadsort benchmark

The visualization is available on YouTube and there's also a YouTube video of a java port of quadsort in ArrayV on a wide variety of data distributions.

Benchmark: quadsort vs std::stable_sort vs timsort

The following benchmark was on WSL 2 gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04) using the wolfsort benchmark. The source code was compiled using g++ -O3 -w -fpermissive bench.c. Stablesort is g++'s std:stablesort function. Each test was ran 100 times on 100,000 elements. A table with the best and average time in seconds can be uncollapsed below the bar graph.

Graph

<details><summary><b>data table</b></summary>
NameItemsTypeBestAverageComparesSamplesDistribution
stablesort1000001280.0109580.0112150100random order
fluxsort1000001280.0085890.0088370100random order
timsort1000001280.0127990.0131850100random order
NameItemsTypeBestAverageComparesSamplesDistribution
stablesort100000640.0061340.0062320100random order
fluxsort100000640.0019450.0019940100random order
timsort100000640.0078240.0080700100random order
NameItemsTypeBestAverageLoopsSamplesDistribution
stablesort100000320.0059950.0060680100random order
fluxsort100000320.0018410.0018900100random order
timsort100000320.0075930.0077730100random order
stablesort100000320.0038150.0038410100random % 100
fluxsort100000320.0006550.0006800100random % 100
timsort100000320.0056080.0056660100random % 100
stablesort100000320.0006720.0007330100ascending order
fluxsort100000320.0000440.0000450100ascending order
timsort100000320.0000450.0000450100ascending order
stablesort100000320.0013600.0014100100ascending saw
fluxsort100000320.0003280.0003300100ascending saw
timsort100000320.0008400.0008520100ascending saw
stablesort100000320.0011210.0011540100pipe organ
fluxsort100000320.0002050.0002070100pipe organ
timsort100000320.0004650.0004690100pipe organ
stablesort100000320.0009040.0009200100descending order
fluxsort100000320.0000550.0000550100descending order
timsort100000320.0000880.0000920100descending order
stablesort100000320.0016030.0016410100descending saw
fluxsort100000320.0004180.0004270100descending saw
timsort100000320.0007880.0008160100descending saw
stablesort100000320.0020290.0020950100random tail
fluxsort100000320.0006230.0006270100random tail
timsort100000320.0019960.0020410100random tail
stablesort100000320.0034910.0035390100random half
fluxsort100000320.0010710.0010780100random half
timsort100000320.0040250.0040560100random half
stablesort100000320.0009180.0009400100ascending tiles
fluxsort100000320.0002930.0002960100ascending tiles
timsort100000320.0008500.0009310100ascending tiles
stablesort100000320.0011680.0014310100bit reversal
fluxsort100000320.0017000.0017310100bit reversal
timsort100000320.0022610.0029400100bit reversal
</details>

The following benchmark was on WSL 2 gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04) using the wolfsort benchmark. The source code was compiled using g++ -O3 -w -fpermissive bench.c. It measures the performance on random data with array sizes ranging from 1 to 1024. It's generated by running the benchmark using 1024 0 0 as the argument. The benchmark is weighted, meaning the number of repetitions halves each time the number of items doubles. A table with the best and average time in seconds can be uncollapsed below the bar graph.

Graph

<details><summary><b>data table</b></summary>
NameItemsTypeBestAverageComparesSamplesDistribution
stablesort4320.0055690.0058990.050random 1-4
quadsort4320.0011440.0011890.050random 1-4
timsort4320.0023010.0024910.050random 1-4
stablesort8320.0057310.0059500.050random 5-8
quadsort8320.0020640.0022000.050random 5-8
timsort8320.0049580.0051650.050random 5-8
stablesort16320.0063600.0064150.050random 9-16
quadsort16320.0018620.0019270.050random 9-16
timsort16320.0065780.0066630.050random 9-16
stablesort32320.0078090.0078850.050random 17-32
quadsort32320.0031770.0032580.050random 17-32
timsort32320.0085970.0086980.050random 17-32
stablesort64320.0088460.0089180.050random 33-64
quadsort64320.0041440.0041950.050random 33-64
timsort64320.0114590.0115600.050random 33-64
stablesort128320.0100650.0101310.050random 65-128
quadsort128320.0051310.0051840.050random 65-128
timsort128320.0139170.0140220.050random 65-128
stablesort256320.0112170.0113050.050random 129-256
quadsort256320.0049370.0050100.050random 129-256
timsort256320.0157850.0159120.050random 129-256
stablesort512320.0125440.0126370.050random 257-512
quadsort512320.0055450.0056180.050random 257-512
timsort512320.0175330.0176520.050random 257-512
stablesort1024320.0138710.0139790.050random 513-1024
quadsort1024320.0056640.0057550.050random 513-1024
timsort1024320.0191760.0192700.050random 513-1024
stablesort2048320.0109610.0110180.050random 1025-2048
quadsort2048320.0045270.0045800.050random 1025-2048
timsort2048320.0152890.0153380.050random 1025-2048
stablesort4096320.0108540.0109170.050random 2049-4096
quadsort4096320.0039740.0040180.050random 2049-4096
timsort4096320.0150510.0151320.050random 2049-4096
</details>

Benchmark: quadsort vs qsort (mergesort)

The following benchmark was on WSL 2 gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04). The source code was compiled using gcc -O3 bench.c. Each test was ran 100 times. It's generated by running the benchmark using 100000 100 1 as the argument. In the benchmark quadsort is compared against glibc qsort() using the same general purpose interface and without any known unfair advantage, like inlining. A table with the best and average time in seconds can be uncollapsed below the bar graph.

Graph

<details><summary><b>data table</b></summary>
NameItemsTypeBestAverageComparesSamplesDistribution
qsort100000640.0168810.0170521536381100random string
quadsort100000640.0106150.0107561655772100random string
qsort100000640.0153870.0155501536491100random double
quadsort100000640.0086480.0087511655904100random double
qsort100000640.0111650.0113751536491100random long
quadsort100000640.0060240.0060991655904100random long
qsort100000640.0107750.0109281536634100random int
quadsort100000640.0053130.0053751655948100random int
NameItemsTypeBestAverageComparesSamplesDistribution
qsort1000001280.0182140.0188431536491100random order
quadsort1000001280.0110980.0111851655904100random order
NameItemsTypeBestAverageComparesSamplesDistribution
qsort100000640.0095220.0097481536491100random order
quadsort100000640.0040730.0041181655904100random order
NameItemsTypeBestAverageComparesSamplesDistribution
qsort100000320.0089460.0091491536634100random order
quadsort100000320.0033420.0033911655948100random order
qsort100000320.0068680.0070591532324100random % 100
quadsort100000320.0026900.0027401381730100random % 100
qsort100000320.0026120.002845815024100ascending order
quadsort100000320.0001600.00016199999100ascending order
qsort100000320.0033960.003622915020100ascending saw
quadsort100000320.0009040.000925368457100ascending saw
qsort100000320.0026720.002803884462100pipe organ
quadsort100000320.0004660.000469277443100pipe organ
qsort100000320.0024690.002587853904100descending order
quadsort100000320.0001640.00016599999100descending order
qsort100000320.0033020.003453953892100descending saw
quadsort100000320.0009290.000941380548100descending saw
qsort100000320.0042500.0045011012003100random tail
quadsort100000320.0011880.001208564953100random tail
qsort100000320.0059600.0061331200707100random half
quadsort100000320.0020470.002078980778100random half
qsort100000320.0039030.0043521209200100ascending tiles
quadsort100000320.0020720.002170671191100ascending tiles
qsort100000320.0051650.0061681553378100bit reversal
quadsort100000320.0031460.0031971711215100bit reversal
</details>

The following benchmark was on WSL 2 gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04). The source code was compiled using gcc -O3 bench.c. Each test was ran 100 times. It's generated by running the benchmark using 10000000 0 0 as the argument. The benchmark is weighted, meaning the number of repetitions halves each time the number of items doubles. A table with the best and average time in seconds can be uncollapsed below the bar graph.

Graph

<details><summary><b>data table</b></summary>
NameItemsTypeBestAverageComparesSamplesDistribution
qsort10320.2183100.2245052210random 10
quadsort10320.0917500.0923122910random 10
qsort100320.3919620.39663954110random 100
quadsort100320.1739280.17779464610random 100
qsort1000320.5580550.566364870710random 1000
quadsort1000320.2203950.222146981710random 1000
qsort10000320.7355280.74135312045410random 10000
quadsort10000320.2678600.26992413166810random 10000
qsort100000320.9071610.910446153642110random 100000
quadsort100000320.3395410.340942165570310random 100000
qsort1000000321.0852751.0890681867453210random 1000000
quadsort1000000320.4017150.4038601981627010random 1000000
qsort10000000321.3139221.31991122010592110random 10000000
quadsort10000000320.5993930.60163523151313110random 10000000
</details>

Benchmark: quadsort vs pdqsort vs fluxsort

The following benchmark was on WSL 2 gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04) using the wolfsort benchmark. The source code was compiled using g++ -O3 -w -fpermissive bench.c. Pdqsort is a branchless quicksort/heapsort/insertionsort hybrid. Fluxsort is a branchless quicksort/mergesort hybrid. Each test was ran 100 times on 100,000 elements. Comparisons are fully inlined. A table with the best and average time in seconds can be uncollapsed below the bar graph.

Graph

<details><summary><b>data table</b></summary>
NameItemsTypeBestAverageComparesSamplesDistribution
pdqsort1000001280.0057730.0058590100random order
quadsort1000001280.0098130.0098820100random order
fluxsort1000001280.0086030.0087040100random order
NameItemsTypeBestAverageComparesSamplesDistribution
pdqsort100000640.0026710.0026860100random order
quadsort100000640.0025160.0025340100random order
fluxsort100000640.0019780.0020030100random order
NameItemsTypeBestAverageLoopsSamplesDistribution
pdqsort100000320.0025890.0026070100random order
quadsort100000320.0024470.0024660100random order
fluxsort100000320.0018510.0018730100random order
pdqsort100000320.0007800.0007880100random % 100
quadsort100000320.0017880.0018120100random % 100
fluxsort100000320.0006750.0006880100random % 100
pdqsort100000320.0000840.0000850100ascending order
quadsort100000320.0000510.0000510100ascending order
fluxsort100000320.0000420.0000430100ascending order
pdqsort100000320.0033780.0034020100ascending saw
quadsort100000320.0006150.0006180100ascending saw
fluxsort100000320.0003270.0003370100ascending saw
pdqsort100000320.0027720.0027920100pipe organ
quadsort100000320.0002710.0002710100pipe organ
fluxsort100000320.0002140.0002150100pipe organ
pdqsort100000320.0001870.0001920100descending order
quadsort100000320.0000590.0000590100descending order
fluxsort100000320.0000530.0000530100descending order
pdqsort100000320.0031480.0031650100descending saw
quadsort100000320.0006140.0006260100descending saw
fluxsort100000320.0003270.0003310100descending saw
pdqsort100000320.0024980.0025200100random tail
quadsort100000320.0008130.0008420100random tail
fluxsort100000320.0006240.0006270100random tail
pdqsort100000320.0025730.0025900100random half
quadsort100000320.0014510.0014620100random half
fluxsort100000320.0010640.0010750100random half
pdqsort100000320.0022560.0022810100ascending tiles
quadsort100000320.0008150.0008230100ascending tiles
fluxsort100000320.0003130.0003150100ascending tiles
pdqsort100000320.0025700.0025890100bit reversal
quadsort100000320.0022300.0022590100bit reversal
fluxsort100000320.0017180.0017440100bit reversal
</details>

The following benchmark was on WSL clang version 10 (10.0.0-4ubuntu1~18.04.2) using rhsort's wolfsort benchmark. The source code was compiled using clang -O3. The bar graph shows the best run out of 100 on 131,072 32 bit integers. Comparisons for quadsort, fluxsort and glidesort are inlined.

Some additional context is required for this benchmark. Glidesort is written and compiled in Rust which supports branchless ternary operations, subsequently fluxsort and quadsort are compiled using clang with branchless ternary operations in place for the merge and small-sort routines. Since fluxsort and quadsort are optimized for gcc there is a performance penalty, with some of the routines running 2-3x slower than they do in gcc.

fluxsort vs glidesort

<details><summary>data table</summary>
NameItemsTypeBestAverageLoopsSamplesDistribution
quadsort131072320.0021740.0022090100random order
fluxsort131072320.0021890.0022050100random order
glidesort131072320.0030650.0031250100random order
quadsort131072320.0016230.0016460100random % 100
fluxsort131072320.0008370.0008560100random % 100
glidesort131072320.0010310.0010370100random % 100
quadsort131072320.0000610.0000630100ascending order
fluxsort131072320.0000580.0000600100ascending order
glidesort131072320.0000910.0000930100ascending order
quadsort131072320.0003450.0003530100ascending saw
fluxsort131072320.0003410.0003490100ascending saw
glidesort131072320.0003510.0003580100ascending saw
quadsort131072320.0002310.0002450100pipe organ
fluxsort131072320.0002220.0002280100pipe organ
glidesort131072320.0002280.0002350100pipe organ
quadsort131072320.0000740.0000760100descending order
fluxsort131072320.0000730.0000760100descending order
glidesort131072320.0001060.0001100100descending order
quadsort131072320.0003730.0003800100descending saw
fluxsort131072320.0003550.0003710100descending saw
glidesort131072320.0003630.0003690100descending saw
quadsort131072320.0006850.0006970100random tail
fluxsort131072320.0007200.0007260100random tail
glidesort131072320.0009530.0009660100random tail
quadsort131072320.0011920.0012040100random half
fluxsort131072320.0012510.0012660100random half
glidesort131072320.0016500.0016790100random half
quadsort131072320.0014720.0015070100ascending tiles
fluxsort131072320.0005780.0005890100ascending tiles
glidesort131072320.0025590.0025760100ascending tiles
quadsort131072320.0022100.0022310100bit reversal
fluxsort131072320.0020420.0020530100bit reversal
glidesort131072320.0027870.0028070100bit reversal
quadsort131072320.0012370.0012780100random % 2
fluxsort131072320.0002270.0002330100random % 2
glidesort131072320.0004490.0004550100random % 2
quadsort131072320.0011230.0011530100signal
fluxsort131072320.0012690.0012850100signal
glidesort131072320.0037600.0037760100signal
quadsort131072320.0019110.0019560100exponential
fluxsort131072320.0011340.0011420100exponential
glidesort131072320.0023550.0023730100exponential
</details>