Home

Awesome

text-builder-linear Hackage Stackage LTS Stackage Nightly

Linear types for linear times!

Builder for strict Text and ByteString, based on linear types. It consistently outperforms lazy Builder from text as well as a strict builder from text-builder, and scales better.

Example

> :set -XOverloadedStrings
> import Data.Text.Builder.Linear
> fromText "foo" <> fromChar '_' <> fromDec (42 :: Int)
"foo_42"

Design

String builders in Haskell serve the same purpose as StringBuilder in Java to prevent quadratic slow down in concatenation.

Classic builders such as Data.Text.Lazy.Builder are lazy and fundamentally are dlist with bells and whistles: instead of actually concatenating substrings we compose actions, which implement concatenation, building a tree of thunks. The tree can be forced partially, left-to-right, producing chunks of strict Text, combined into a lazy one. Neither input, nor output need to be materialized in full, which potentially allows for fusion. Such builders allow linear time complexity, but constant factors are relatively high, because thunks are expensive. To a certain degree this is mitigated by inlining, which massively reduces number of nodes.

Strict builders such as text-builder offer another design: they first inspect their input in full to determine output length, then allocate a buffer of required size and fill it in one go. If everything inlines nicely, the length may be known in compile time, which gives blazingly fast runtime. In more complex cases it still builds a tree of thunks and forces all inputs to be materialized.

This package offers two interfaces. One is a mutable Buffer with linear API, which operates very similar to StringBuilder in Java. It allocates a buffer with extra space at the ends to append new strings. If there is not enough free space to insert new data, it allocates a twice larger buffer and copies itself there. The dispatch happens in runtime, so we do not need to inspect and materialize all inputs beforehand; and inlining is mostly irrelevant. Exponential growth provides for amortized linear time. Such structure can be implemented without linear types, but that would greatly affect user experience by polluting everything with ST monad. Users are encouraged to use Buffer API, and built-in benchmarks refer to it.

The second interface is more traditional newtype Builder = Builder (Buffer ⊸ Buffer) with Monoid instance. This type provides easy migration from other builders, but may suffer from insufficient inlining, allocating a tree of thunks. It is still significantly faster than Data.Text.Lazy.Builder, as witnessed by benchmarks for blaze-builder below.

Case study

Let's benchmark builders, which concatenate all Char from minBound to maxBound, producing a large Text:

#!/usr/bin/env cabal
{- cabal:
build-depends: base, tasty-bench, text, text-builder, text-builder-linear
ghc-options: -O2
-}

import qualified Data.Text as T
import qualified Data.Text.Lazy as TL
import qualified Data.Text.Lazy.Builder as TLB
import qualified Text.Builder as TB
import qualified Data.Text.Builder.Linear as TBL
import System.Environment (getArgs)
import Test.Tasty.Bench

mkBench :: Monoid a => String -> (Char -> a) -> (a -> Int) -> Benchmark
mkBench s f g = bench s $ nf (g . foldMap f . enumFromTo minBound) maxBound
{-# INLINE mkBench #-}

main :: IO ()
main = defaultMain
  [ mkBench "text, lazy" TLB.singleton (fromIntegral . TL.length . TLB.toLazyText)
  , mkBench "text, strict" TLB.singleton (T.length . TL.toStrict . TLB.toLazyText)
  , mkBench "text-builder" TB.char (T.length . TB.run)
  , mkBench "text-builder-linear" TBL.fromChar (T.length . TBL.runBuilder)
  ]

Running this program with cabal run Main.hs -- +RTS -T yields following results:

text, lazy:
  4.25 ms ± 107 μs,  11 MB allocated, 912 B  copied
text, strict:
  7.18 ms ± 235 μs,  24 MB allocated,  10 MB copied
text-builder:
  80.1 ms ± 3.0 ms, 218 MB allocated, 107 MB copied
text-builder-linear:
  5.37 ms ± 146 μs,  44 MB allocated,  78 KB copied

The first result seems the best both in time and memory and corresponds to the usual Text builder, where we do not materialize the entire result at all. It builds chunks of lazy Text lazily and consumes them at once by TL.length. Thus there are 11 MB of allocations in nursery, none of which survive generation 0 garbage collector, so nothing is copied.

The second result is again the usual Text builder, but emulates a strict consumer: we materialize a strict Text before computing length. Allocation are doubled, and half of them (corresponding to the strict Text) survive to the heap. Time is also almost twice longer, but still quite good.

The third result is for text-builder and demonstrates how bad things could go with strict builders, aiming to precompute the precise length of the buffer: allocating a thunk per char is tremendously slow and expensive.

The last result corresponds to the current package. We generate a strict Text by growing and reallocating the buffer, thus allocations are quite high. Nevertheless, it is already faster than the usual Text builder with strict consumer and does not strain the garbage collector.

Things get very different if we remove {-# INLINE mkBench #-}:

text, lazy:
  36.9 ms ± 599 μs, 275 MB allocated,  30 KB copied
text, strict:
  44.7 ms ± 1.3 ms, 287 MB allocated,  25 MB copied
text-builder:
  77.6 ms ± 2.2 ms, 218 MB allocated, 107 MB copied
text-builder-linear:
  5.35 ms ± 212 μs,  44 MB allocated,  79 KB copied

Builders from text package degrade rapidly, 6-8x slower and 10-20x more allocations. That's because their constant factors rely crucially on everything getting inlined, which makes their performance fragile and unreliable in large-scale applications. On the bright side of things, our builder remains as fast as before and now is a clear champion.

Benchmarks for Text

Measured with GHC 9.6 on aarch64:

Group / sizetexttext-builderThis package
Text
147.4 ns24.2 ns0.51x35.2 ns0.74x
10509 ns195 ns0.38x197 ns0.39x
1004.94 μs1.74 μs0.35x1.66 μs0.34x
100052.6 μs17.0 μs0.32x15.0 μs0.28x
10000646 μs206 μs0.32x155 μs0.24x
10000012.2 ms3.34 ms0.27x2.60 ms0.21x
1000000159 ms55.3 ms0.35x16.1 ms0.10x
Char
146.9 ns21.1 ns0.45x22.3 ns0.48x
10229 ns152 ns0.66x79.9 ns0.35x
1002.00 μs1.23 μs0.61x618 ns0.31x
100021.9 μs10.3 μs0.47x6.28 μs0.29x
10000285 μs153 μs0.54x68.5 μs0.24x
1000007.70 ms4.08 ms0.53x992 μs0.13x
1000000110 ms106 ms0.96x9.19 ms0.08x
Decimal
197.7 ns872 ns8.92x80.2 ns0.82x
10864 ns8.72 μs10.09x684 ns0.79x
1009.07 μs93.5 μs10.32x7.25 μs0.80x
100092.4 μs1.06 ms11.44x67.5 μs0.73x
100001.13 ms13.4 ms11.88x667 μs0.59x
10000018.7 ms141 ms7.57x7.57 ms0.41x
1000000229 ms1.487 s6.48x67.8 ms0.30x
Hexadecimal
1403 ns749 ns1.86x43.9 ns0.11x
103.94 μs7.66 μs1.94x308 ns0.08x
10042.8 μs89.0 μs2.08x2.88 μs0.07x
1000486 μs986 μs2.03x27.7 μs0.06x
100007.10 ms12.6 ms1.77x283 μs0.04x
10000080.1 ms133 ms1.65x3.53 ms0.04x
1000000867 ms1.340 s1.55x28.9 ms0.03x
Double
17.56 μs18.3 μs2.42x414 ns0.05x
1076.5 μs188 μs2.46x4.23 μs0.06x
100754 μs2.35 ms3.11x44.4 μs0.06x
10007.94 ms25.8 ms3.25x436 μs0.05x
1000079.1 ms285 ms3.60x4.90 ms0.06x
100000796 ms2.938 s3.69x45.1 ms0.06x
10000008.003 s32.411 s4.05x436 ms0.05x

If you are not convinced by synthetic data, here are benchmarks for blaze-markup after migration to Data.Text.Builder.Linear:

bigTable
  992  μs ±  80 μs, 49% less than baseline
basic
  4.35 μs ± 376 ns, 47% less than baseline
wideTree
  1.26 ms ±  85 μs, 53% less than baseline
wideTreeEscaping
  217  μs ± 7.8 μs, 58% less than baseline
deepTree
  242  μs ±  23 μs, 48% less than baseline
manyAttributes
  811  μs ±  79 μs, 58% less than baseline
customAttribute
  1.68 ms ± 135 μs, 56% less than baseline

Benchmarks for ByteString

Somewhat surprisingly, text-builder-linear now offers rendering to strict ByteString as well. It is consistently faster than bytestring when a string gets over 32k (which is defaultChunkSize for bytestring builder). For mid-sized strings bytestring is slightly faster in certain disciplines, mostly by virtue of using cbits via FFI, while this package remains 100% native Haskell.

Benchmarks below were measured with GHC 9.6 on aarch64 and include comparison to bytestring-strict-builder:

Group / sizebytestring…-strict-builderThis package
Text
1106 ns33.5 ns0.32x35.2 ns0.33x
10322 ns217 ns0.68x197 ns0.61x
1002.49 μs1.89 μs0.76x1.66 μs0.67x
100021.8 μs18.5 μs0.85x15.0 μs0.69x
10000231 μs212 μs0.92x155 μs0.67x
1000003.97 ms3.54 ms0.89x2.60 ms0.66x
100000081.2 ms51.5 ms0.63x16.1 ms0.20x
Char
199.0 ns19.4 ns0.20x22.3 ns0.23x
10270 ns82.9 ns0.31x79.9 ns0.30x
1001.77 μs723 ns0.41x618 ns0.35x
100020.4 μs8.37 μs0.41x6.28 μs0.31x
10000322 μs129 μs0.40x68.5 μs0.21x
10000010.4 ms2.50 ms0.24x992 μs0.10x
1000000143 ms67.4 ms0.47x9.19 ms0.06x
Decimal
1152 ns174 ns1.14x80.2 ns0.53x
10685 ns1.55 μs2.26x684 ns1.00x
1005.88 μs17.2 μs2.93x7.25 μs1.23x
100060.3 μs196 μs3.25x67.5 μs1.12x
10000648 μs4.25 ms6.57x667 μs1.03x
10000011.2 ms62.8 ms5.62x7.57 ms0.68x
1000000150 ms655 ms4.37x67.8 ms0.45x
Hexadecimal
194.7 ns43.9 ns0.46x
10255 ns308 ns1.21x
1001.72 μs2.88 μs1.67x
100018.9 μs27.7 μs1.46x
10000250 μs283 μs1.13x
1000006.94 ms3.53 ms0.51x
100000093.2 ms28.9 ms0.31x
Double
1457 ns414 ns0.91x
103.94 μs4.23 μs1.07x
10040.3 μs44.4 μs1.10x
1000398 μs436 μs1.10x
100005.65 ms4.90 ms0.87x
10000063.3 ms45.1 ms0.71x
1000000673 ms436 ms0.65x