Home

Awesome

Intro

This document describes a stable top-down adaptive branchless merge sort named piposort. It is intended as a simplified quadsort with reduced adaptivity, but a great reduction in lines of code and overall complexity. The name stands for ping-pong.

Quad partitioning

During the partitioning phase the array is split into 4 segments (instead of the traditional 2) until the segment size drops below 8 elements.

Adaptive oddeven sort

Segments ranging in size from 2 to 7 elements are sorted with oddeven sort, which is a bubblesort variant with a better memory access pattern, which improves performance when combined with branchless operations.

The best case is 6 comparisons to sort 7 elements, the worst case is 21. Its overall performance is very good.

Branchless parity merge

Segments sorted with oddeven sort are ping-pong merged with branchless parity merges, 4 segments at a time, until the array is fully sorted.

Adaptiveness

Between parity merges it is checked whether the four segments are in order or in reverse order. Reverse order segments are placed in their proper position using a simple auxiliary rotation.

Big O

                 ┌───────────────────────┐┌───────────────────────┐
                 │comparisons            ││swap memory            │
┌───────────────┐├───────┬───────┬───────┤├───────┬───────┬───────┤┌──────┐┌─────────┐┌─────────┐
│name           ││min    │avg    │max    ││min    │avg    │max    ││stable││partition││adaptive │
├───────────────┤├───────┼───────┼───────┤├───────┼───────┼───────┤├──────┤├─────────┤├─────────┤
│mergesort      ││n log n│n log n│n log n││n      │n      │n      ││yes   ││no       ││no       │
├───────────────┤├───────┼───────┼───────┤├───────┼───────┼───────┤├──────┤├─────────┤├─────────┤
│quadsort       ││n      │n log n│n log n││1      │n      │n      ││yes   ││no       ││yes      │
├───────────────┤├───────┼───────┼───────┤├───────┼───────┼───────┤├──────┤├─────────┤├─────────┤
│piposort       ││n      │n log n│n log n││n      │n      │n      ││yes   ││no       ││semi     │
└───────────────┘└───────┴───────┴───────┘└───────┴───────┴───────┘└──────┘└─────────┘└─────────┘

Data Types

The C implementation of piposort supports long doubles and 8, 16, 32, and 64 bit data types. By using pointers it's possible to sort any other data type, like strings.

Interface

Piposort uses the same interface as qsort, which is described in man qsort.

Memory

Piposort requires n memory.

Performance

For random data piposort is one of the fastest sorts available. There is no adaptivity for generic data, so fluxsort would outperform piposort for random data with low cardinality.

To take full advantage of branchless operations the cmp macro needs to be uncommented in bench.c, which will increase the performance by 30% on primitive types. This mode only allows sorting arrays of signed primitives by default.

The code is optimized for gcc and needs to be compiled with the -O3 option for proper performance.

Benchmark: piposort vs qsort (mergesort)

The following benchmark was on WSL 2 gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04). The source code was compiled using gcc -O3 bench.c. Each test was ran 100 times. It's generated by running the benchmark using 100000 100 1 as the argument. In the benchmark piposort is compared against glibc qsort() using the same general purpose interface and without any known unfair advantage, like inlining.

NameItemsTypeBestAverageComparesSamplesDistribution
qsort100000640.0168550.0170321536381100random string
piposort100000640.0108920.0110191638620100random string
NameItemsTypeBestAverageComparesSamplesDistribution
qsort1000001280.0190670.0199591536491100random order
piposort1000001280.0118620.0119261638209100random order
NameItemsTypeBestAverageComparesSamplesDistribution
qsort100000640.0093770.0094481536491100random order
piposort100000640.0040360.0040531638209100random order
NameItemsTypeBestAverageComparesSamplesDistribution
qsort100000320.0089470.0090871536634100random order
piposort100000320.0033340.0033921638867100random order
qsort100000320.0068760.0070381532324100random % 100
piposort100000320.0033370.0034521637919100random % 100
qsort100000320.0026080.002667815024100ascending order
piposort100000320.0002790.00029299999100ascending order
qsort100000320.0033910.003574915020100ascending saw
piposort100000320.0006480.000671299998100ascending saw
qsort100000320.0026650.002681884462100pipe organ
piposort100000320.0009320.000975388889100pipe organ
qsort100000320.0024550.002555853904100descending order
piposort100000320.0008790.000903277780100descending order
qsort100000320.0032920.003357953892100descending saw
piposort100000320.0012140.001300477778100descending saw
qsort100000320.0042340.0044211012003100random tail
piposort100000320.0013180.001352634685100random tail
qsort100000320.0059660.0060831200707100random half
piposort100000320.0019910.002015969386100random half
qsort100000320.0043500.0044341209200100ascending tiles
piposort100000320.0051770.0053681573354100ascending tiles
qsort100000320.0052210.0058431553378100bit reversal
piposort100000320.0032570.0033671626250100bit reversal