Home

Awesome

Lee-STMX

Lee-STMX is a benchmark for STMX, a high-performance software transactional memory library for Common Lisp.

STMX is available from GitHub, more information about it can be found in its README.

A general introduction on software transactional memory is available in many places, including at least Wikipedia and several research papers. One of the most introductory, but still quite technical is Composable Memory Transaction from Microsoft research.

Lee-STMX is a porting of Lee-TM to STMX. It is a non-trivial benchmark suite for transactional memory developed in 2007 by the University of Manchester (UK), then ported to STMX by the STMX author. It aims at implementing a realistic workload and uses longer transactions than what found in common "artificial" micro-benchmarks.

Quite unfortunately, a straightforward porting of Lee-TM algorithm was found to have a strongly unbalanced workload. More than 99.5% of the CPU time was spent in reading transactional memory from outside any transaction; only less than 0.5% of the CPU time was actually spent inside transactions, including writing to transactional memory.

For this reason, Lee-STMX benchmark has been modified to use a variant of Hadlock routing algorithm which, beyond being much faster, spends a smaller percentage of time in the read-only phase. The modified algorithm is better, but still quite unbalanced: 92-94% of the CPU time is still spent in reading transactional memory from outside any transaction; less than 1% of the CPU time is spent in atomic transactions that also write into transactional memory.

Status

As of May 2013, Lee-STMX is being written by Massimiliano Ghilardi and it is considered BETA quality by the author.

At the same date, STMX is considered to be stable.

Benchmark results

What follows are some timings obtained on the authors's system, and by no means they claim to be exact, absolute or reproducible: your mileage may vary.

Date: 15 June 2013

Hardware: Intel Core-i5 750 @4.0 GHz (quad-core), 16GB RAM

Software: Debian GNU/Linux 7.0 (x86_64), SBCL 1.1.8 (x86_64), STMX 1.3.2

Results for MAINBOARD benchmark: nil

Results for MEMBOARD benchmark: nil

Analysis and comments

With 4+ threads on a quad-core machine, STMX delivers up to 95-105% performance improvement with respect to single-threaded code on this benchmark (the peak performance ratio is 1257.0 / 614.8 = 2.044 = 204.4% on the MEMBOARD workload with 5 threads). Good, but still far from the theorical maximum of 300% improvement, i.e. quadruple speed. Quite unexpectedly, the peak performance is not reached by using one thread per core, but depends on the workload. In the two workloads shown above, the peak is reached respectively at 10 and 5 threads.

This is encouraging. After many research papers discussing and analyzing the overhead imposed by software transactional memory, and looking for optimizations to reduce such overhead, the performance penalty of STMX transactions is quite small - at least on this benchmark.

The optimized locking code running on the same quad-code machine delivers up to 125-155% performance improvement with respect to single-threaded code (the peak performance ratio is 996.0 / 388.1 = 2.566 = 256.6% on the MEMBOARD workload with 10 threads). Better than transactions, but still far from quadruple speed.

This means that Lee-STMX benchmark, even after the modifications that introduced the Hadlock routing algorithm, is almost read-only: writes to shared/transactional memory are very few and grouped together, and it spends large amounts of time without writing at all to shared/transactional memory. For this reason the performance results risk having limited significance, since most of the time is actualy spent outside transactions. On the other hand, also the optimized locking code takes advantage heavily of this unbalance, to the point of being implemented with unlocked reads and a global write lock (plus some tricks borrowed by transactions to handle inconsistent reads and failures).

Summary

Lee-STMX is a benchmark for software transactional memory aiming at implementing a realistic workload. STMX shows good performance on it: 95-105% better than single-threaded code on a quad-core machine, yet behind highly optimized locking code, which shows 125-155% better performance than single-threaded code.

Quite unfortunately, Lee-STMX was found to have a strongly unbalanced workload: it contains a high number of transactioanl memory reads (actually executed outside transactions), but very few and grouped transactional memory writes.

Since there are no obviously tunable parameters to change the ratio of reads and writes, nor to change the ratio of time spent inside and outside transactions, STMX author recommends further benchmarking to validate STMX performance on workloads with more transactional writes and more time spent inside transactions.

Legal

Lee-STMX is released under the terms of the BSD license.

STMX is released under the terms of the Lisp Lesser General Public License, known as the LLGPL.