Home

Awesome

StringFormatter

A zero-allocation* string formatting library for .NET applications.

Motivation

The built-in string formatting facilities in .NET are robust and quite usable. Unfortunately, they also perform a ridiculous number of GC allocations. Mostly these are short lived, and on the desktop GC they generally aren't noticeable. On more constrained systems however, they can be painful. Additionally, if you're trying to track your GC usage via live reporting in your program, you might quickly notice that attempts to print out the current GC state cause additional allocations, defeating the entire attempt at instrumentation.

Thus the existence of this library. It's not completely allocation free; there are several one-time setup costs. The steady state though is entirely allocation-free. You can freely use the string formatting utilities in the main loop of a game without it causing a steady churn of garbage.

Quick Start

The library requires no installation, package management, or any other complicated distribution mechanism. Simply copy the StringFormatter.cs file into your project and start using it.

At its simplest, you can make use of the static StringBuffer.Format convenience method. The StringBuffer formatting methods accept all of the formatting features supported by the .NET BCL.

string result = StringBuffer.Format("{0,-8:x} some text -- {1:C11} {2} more text here {3:G}", -15, 13.4512m, true, double.MaxValue);
// output:
// "-15      some text -- 13.4512 True more text here 1.79769313486232E+308"

Allocation Analysis

Let's look at the allocations performed by the previous example and compare them to the BCL's string.Format.

MineBCLExplanation
Parameters01+4Boxing value types plus params[] array allocation
static Format() cache11Allocating a new StringBuffer / StringBuilder (will be cached in both cases)
Constructor11Allocation of the backing char[] array.
Format specifiers03*3In the BCL, each specifier in the format string results in a new StringBuilder allocation, an underlying buffer allocation, and then a ToString() call.
Each argument04The BCL calls ToString() on each argument.
ToString11No way around it, if you want a string instance you need to allocate.

Tally them up, we get the following totals:

MineBCL
First Time321
Each Additional119

At the steady state, StringBuffer requires 1 allocation per format call, regardless of the number of arguments. StringBuilder requires 2 + 5n, where n is the number of arguments. There is an additional cost not mentioned in the above table: each type reallocates its internal buffer when the size of the resulting string grows too large. If you set your capacity properly and Clear() your buffer between format operations (as the static Format() methods do) you can avoid this cost entirely.

Note: that single allocation performed by StringBuffer, calling ToString() on the result, can be avoided by using additional library features described below.

Features

StringBuffer has a similar API to StringBuilder. You can create an instance and set a capacity and then reuse that buffer for many operations, avoiding any allocations in the process.

var buffer = new StringBuffer(128);
buffer.Append(32.53);
buffer.Clear();
buffer.AppendFormat("{0}", "Foo");
var result = buffer.ToString();

StringBuffer is fully culture-aware. Unlike the BCL APIs which require you to pass the desired CultureInfo around all over the place, StringBuffer caches the culture during initialization and all subsequent formatting calls use it automatically. If for some reason you want to mix and match strings for different cultures in the same buffer, you'll have to manage that yourself.

(*) If you want to avoid even the one allocation incurred by calling ToString() on the result of the StringBuffer, you can make use of the CopyTo methods. These provide methods to copy the internal data to either managed buffers or to an arbitrary char pointer. You can allocate stack memory or native heap memory and avoid any GC overhead entirely on a per-string basis:

buffer.Append("Hello");

var output = stackalloc char[buffer.Count];
buffer.CopyTo(output, 0, buffer.Count);

Limitations

Unlike in the BCL, each argument to StringBuffer.AppendFormat must either be one of the known built-in types or be a type implementing IStringFormattable. This new interface is the analogue to the BCL's IFormattable. This restriction is part of how StringBuffer is able to avoid boxing arguments.

If you need to work with an existing type that you don't own, you can get around the restriction by using a custom formatter:

StringBuffer.SetCustomFormatter<MyType>(FormatMyType);

void FormatMyType(StringBuffer buffer, MyType value, StringView formatSpecifier) {
}

Once that call has been made, you may pass instances of MyType to any of the format methods.

Another limitation of StringBuffer is that there only exist AppendFormat methods taking up to 8 arguments. Adding additional ones is trivial from a development perspective, but there does exist a statically compiled limit. Thus if you want to provide more, you need to make use of the AppendArgSet method. This takes an instance of an IArgSet, which you must implement, and formats it according to the given format string. Whether or not this results in allocations is up to your implementation.

The format specifier for each argument is passed to the format routines via a StringView, which is a pointer to stack allocated temporary memory. There is an upper limit to the size of this memory, so format specifiers are capped at a hard upper length. Currently that length is 32, though it's easily changed in the source. Most format strings never have specifiers nearly that long; if you're doing something crazy with specifiers though that might become a concern.

Performance

I need to do more in-depth performance analysis and comparisons, but so far my implementation is roughly on par with the BCL versions. Their formatting routines tend to be faster thanks to having hand-coded assembly routines in the CLR, but they also allocate a lot more so it generally ends up being a wash.

There are a few cases where I know I'm significantly slower; for example, denormalized doubles aren't great. If your application needs to format millions of denormalized numbers per second, you might want to consider sticking with the BCL.

Here are some results obtained using BenchmarkDotNet for generating a fully formatted string:

Machine info:

BenchmarkDotNet=v0.9.6.0
OS=Microsoft Windows NT 6.2.9200.0
Processor=Intel(R) Core(TM) i7-4770 CPU @ 3.40GHz, ProcessorCount=8
Frequency=3312644 ticks, Resolution=301.8737 ns, Timer=TSC
HostCLR=MS.NET 4.0.30319.42000, Arch=64-bit RELEASE [RyuJIT]
JitModules=clrjit-v4.6.1078.0

The following test results compare StringBuilder/StringFormat.AppendFormat while returning a new allocated BCL string:

Type=StringFormatBenchmark Mode=Throughput Platform=X64

MethodJitMedianStdDevScaledMinMaxGen 0Gen 1Gen 2Bytes Allocated/Op
BaselineLegacyJit932.3745 ns6.5379 ns1.00911.7104 ns941.6221 ns610.00--230.71
BaselineRyuJit936.3304 ns6.2145 ns1.00929.3742 ns950.8991 ns629.69--238.11
StringBufferLegacyJit824.8445 ns4.3413 ns0.88817.6467 ns834.4629 ns133.87--52.75
StringBufferRyuJit887.2168 ns8.8965 ns0.95869.2266 ns910.2819 ns143.00--56.35

The following test results compare StringBuilder/StringFormat.AppendFormat without allocating, but rather reusing a target buffer for the string. The main point of this test is to confirm that StringFormatter is indeed completely allocation-free when such a behavior is desired:

Type=NoAllocationBenchmark Mode=Throughput Platform=X64

MethodJitMedianStdDevScaledMinMaxGen 0Gen 1Gen 2Bytes Allocated/Op
BaselineLegacyJit913.6370 ns10.0141 ns1.00898.5009 ns934.0547 ns410.37--157.63
BaselineRyuJit920.6765 ns7.4793 ns1.00903.9691 ns930.3559 ns401.64--154.28
NoAllocationLegacyJit824.3390 ns5.8531 ns0.90806.5347 ns833.2877 ns---0.11
NoAllocationRyuJit886.5322 ns3.7329 ns0.96880.4307 ns895.9284 ns---0.11

To Do

There is still some work to be done:

Feedback

If you have any comments, questions, or want to help out, feel free to get in touch or file an issue.