Awesome
API Reference · Benchmarks · Stdlib Benchmarks
<!-- ```ocaml # #thread ``` -->Saturn — Parallelism-Safe Data Structures for Multicore OCaml
This repository is a collection of concurrent-safe data structures for OCaml 5. It aims to provide an industrial-strength, well-tested (and possibly model-checked and verified in the future), well documented, and maintained concurrent-safe data structure library. We want to make it easier for Multicore OCaml users to find the right data structures for their uses.
You can learn more about the motivation behind Saturn
through the implementation of a lock-free stack here.
Saturn is published on opam and is distributed under the ISC license.
Contents
- Saturn — Parallelism-Safe Data Structures for Multicore OCaml
- Contents
- Installation
- Provided data structures
- About the Unsafe Data Structures
- Usage
- Testing
- Benchmarks
- Contributing
Installation
Getting OCaml 5.2.0
To use Saturn, you need OCaml 5.2.0 or later. While Saturn is compatible with OCaml 4.14, this is primarily for compatibility purposes, as parallelism-safe data structures are not required without OCaml 5. Note that versions of OCaml 5 prior to 5.2 are not supported due to bugs in the Atomic
module that affect the functionality of some data structures.
To install OCaml 5.2.0 yourself, first make sure you have opam 2.1 or later. You can run this command to check:
opam --version
Then use opam to install OCaml 5.2.0:
opam switch create 5.2.0
If you want a later version, you can run the following line to get a list of all available compiler versions:
opam switch list-available
Getting Saturn
saturn
can be installed from opam
:
opam install saturn
Provided data structures
Treiber Lock-free Stack
- Module: Stack
- Description: A classic multi-producer, multi-consumer, lock-free stack, known for robustness and flexibility.
- Recommendation: It's a recommended starting point when a LIFO structure is needed.
Lock-free Bounded Stack
- Module: Bounded_stack
- Description: A stack based on the Treiber stack algorithm, with a limited capacity and a
length
function. This ensures that the stack is memory-bounded. - Recommendation: Adding a capacity introduces a general overhead to the operations. It is recommended to use the unbounded stack if neither the capacity nor the
length
function is needed.
Michael-Scott Lock-free Queue
- Module: Queue
- Description: A multi-producer, multi-consumer lock-free queue that is both robust and flexible.
- Recommendation: This structure is ideal when a FIFO setup is required.
- Sources: Simple, Fast, and Practical Non-Blocking and Blocking Concurrent Queue Algorithms
Lock-free Bounded Queue
- Module: Bounded_queue
- Description: A queue based on the Michael-Scott queue algorithm, with a limited capacity and a
length
function. This ensures that the queue is memory-bounded. - Recommendation: Adding a capacity introduces a general overhead to the operations. It is recommended to use the unbounded queue if neither the capacity nor the
length
function is needed.
Lock-free Chase-Lev Work-Stealing Dequeue
- Module: Work_stealing_deque
- Description: Single-producer, multi-consumer dynamic-size deque (double-ended queue).
- Recommendation: Designed for high-throughput scheduling using per-core work distribution. Note that
pop
andsteal
operations follow different ordering (LIFO and FIFO) with distinct linearization constraints. It is a role-oriented data structure: most functions can't be used by all domains. - Sources:
Lock-free Single Producer Single Consumer Queue
- Module: Single_prod_single_cons_queue
- Description: A single-producer, single-consumer fixed-size queue. This specific configuration enables strong optimizations but also makes the data structure unsafe if used improperly, i.e., with more than one producer or one consumer at any time.
- Recommendation: It's concurrent-safe as long as only one thread acts as producer and one as consumer at any time.
Lock-free Multiple Producers Single Consumer Queue
- Module: Single_consumer_queue
- Description: A multi-producer, single-consumer concurrent-safe queue with a closing mechanism to prevent further pushes.
- Recommendation: Designed for scheduler run queues. It is not concurrent-safe if used by multiple consumers simultaneously.
Lock-free Skip List
- Module: Skiplist
- Description: A skiplist is a probabilistic data structure that has an average logarithmic complexity for search and insertion operations. Like
Stdlib.Map
, it is an ordered collection. - Recommendation: The skiplist is not resizable. It will, however, continue to work once the limit capacity is reached, but performance will decrease as the depth of the structure won't be enough to maintain logarithmic performance.
- Sources: See Chapter 14 in The Art of Multiprocessor Programming
Lock-free Hash Table
- Module: Htbl
- Description: A resizable lock-free hash table with a snapshot mechanism.
- Recommendation: Contains useful high-level operations designed to work as building blocks of non-blocking algorithms.
Lock-free Bag
- Module: Bag
- Description: A resizable lock-free bag based on the hash table. The
pop
functions returns a random value contained on the bag.
About the Unsafe Data Structures
Some data structures are available in two versions: a normal version and a more optimized but unsafe version. The unsafe version utilizes Obj.magic
in a way that may be unsafe with flambda2
optimizations.
The reason for providing the unsafe version is that certain optimizations require features that are currently not available in OCaml, such as arrays of atomics or atomic fields in records. We recommend using the normal version of a data structure unless its performance is not sufficient for your use case. In that case, you can try the unsafe version.
Currently, the following data structures have an unsafe version:
Single_cons_single_prod_unsafe
: a single consumer single producer lock-free queueQueue_unsafe
: a Michael-Scott lock-free queueBounded_queue_unsafe
: a lock-free bounded queue based on Michael-Scott queue algorithmHtbl_unsafe
: a lock-free hashtable
Usage
This part describes how to use the provided data structures, and more exactly, what not to do with them. Two main points are discussed:
- some data structures have restrictions on what operations can be performed in a single domain or a set of domains
- the currently provided data structures are non-composable
Data Structures with Domain Roles
Some provided data structures are designed to work with specific domain configurations. These restrictions optimize their implementation, but failing to respect them may compromise safety properties. These limitations are clearly indicated in the documentation and often reflected in the name of the data structure itself. For instance, a single-consumer queue must have only one domain performing pop
operations at any given time.
To learn more about it, see this document.
Composability
Composability refers to the ability to combine functions while preserving their properties. For Saturn data structures, the expected properties include atomic consistency (or linearizability) and progress guarantees, such as lock-freedom. Unfortunately, Saturn's data structures are not composable.
To learn more about it, see this document.
Testing
One of the many difficulties of implementating parallelism-safe data structures is that in addition to providing the same safety properties as sequental ones, they may also have to observe some liveness properties as well as additional safety properties specific to concurrent programming, like deadlock-freedom.
In addition to the expected safety properties, the main properties we want to test for are:
- linearisability,
- lock-freedom for all the lock-free data structures,
- no potentially harmful data races.
Here is a list of the tools we use to ensure them:
- safety : unitary tests and
qcheck
tests check semantics and expected behaviors with one and more domains. - safety and liveness :
STM
tests check linearisability for two domains (seemulticoretests
library). - liveness :
dscheck
checks non-blocking property for as many domains as wanted (for two domains most of the time). See dscheck. - safety : no data race with tsan
See test/README.md for more details.
Benchmarks
There are a number of benchmarks in bench
directory. You can run them with
make bench
. See bench/README.md for more details.
Contributing
Contributions are appreciated! If you intend to add a new data structure, please read this before.