Home

Awesome

checkedthreads

checkedthreads is a fork-join parallelism framework for C and C++ providing:

If you have dozens of developers working on millions of lines of multithreaded code, checkedthreads will let you painlessly parallelize the code - and then continuously ship new versions without worrying about parallelism bugs. It's based on a decade of experience in that kind of environment.

Contents

What race conditions will be found?

All of them! checkedthreads provides two verification methods:

There are more details below; the upshot is that every race condition will be found if:

Nice features

checkedthreads is a fork-join framework not unlike many others, such as Intel TBB, Microsoft PPL, Cilk, OpenMP or GNU libstdc++ parallel mode. How to choose a framework?

Here are some nice features of checkedthreads; you can compare other frameworks' features to this list as you shop around:

Details:

Another nice feature, at the moment, is simplicity and small size. However, these were known to transform into complexity and large size in the past. An effort will be made to avoid that.

There are also missing features - please tell if a feature you need is missing. Making a list of everything not there is a tad hard... One biggie, currently, is concurrency. No means are provided to wait for events except for issuing a blocking call (which "steals" a thread from the underlying thread pool, so it's not any good, really). Generally concurrency is not currently a use case: checkedthreads is a framework for parallelizing computational code which does not interact much with the external world.

API

In a nutshell:

Examples using the C++11 API:

ctx_for(100, [&](int i) {
    ctx_for(100, [&](int j) {
        array[i][j] = i*j;
    });
});

Absolutely boneheaded code, but you get the idea. i and j go from 0 to 99. Currently there's no way to specify a start other than 0 or an increment other than 1. There's also no way to control "grain size" - each index is a separately scheduled task. So a non-trivial amount of work should be done per index, or the scheduling overhead will dwarf any gains from running on several cores.

For a better example, here's parallel sorting:

template<class T>
void quicksort(T* beg, T* end) {
    if (end-beg >= MIN_PAR) {
        int piv = *beg, l = 1, r = end-beg;
        while (l < r) {
            if (beg[l] <= piv) 
                l++;
            else
                std::swap(beg[l], beg[--r]);
        }   
        std::swap(beg[--l], beg[0]);
        //sort the two parts in parallel:
        ctx_invoke( 
            [=] { quicksort(beg, beg+l); },
            [=] { quicksort(beg+r, end); }
        );  
    }
    else {
        std::sort(beg, end);
    }
}

Not unlike ctx_for, ctx_invoke calls all the functions it's passed in parallel.

No function call scheduled by ctx_invoke, nor any iteration of ctx_for, should ever access a memory address modified by any other call/iteration - that is, they should be completely independent. Once ctx_for/invoke returns, all the memory updates done by all the iterations/function calls can be used by the caller of ctx_for/invoke.

Now a C89 example:

void set_elem(int i, void* context) {
    int* array = (int*)context;
    array[i] = i;
}

void example(void) {
    int array[100];
    ct_for(100, set_elem, array, 0);
}

That last "0" is a null pointer to a canceller - we'll get to that in a moment. Meanwhile, parallel invoke in C89:

void a(void* context) { *(int*)context = 1; }
void b(void* context) { *(int*)context = 2; }

void example(void) {
    int first, second;
    ct_task tasks[] = {
        {a, &first},
        {b, &second},
        {0, 0}
    };
    ct_invoke(tasks, 0);
}

The tasks[] array should have {0,0} as its last element. Again, the "0" argument is the canceller.

Speaking of which - here's an example of actually using cancelling:

int pos_of_77 = -1;
ct_canceller* c = ct_alloc_canceller();
ctx_for(N, [&](int i) {
    if(arr[i] == 77) {
        pos_of_77 = i;
        ct_cancel(c);
    }
}, c);
ct_free_canceller(c);

(Again a silly piece of code doing way too little work per index, but no matter.) Notes on cancelling:

The last thing to note is that you need, before using checkedthreads, to call ct_init() - and then call ct_fini() when you're done. ct_init gets a single argument - the environment; for example:

ct_env_var env[] = {
    {"CT_SCHED", "shuffle"},
    {"CT_RAND_REV", "1"},
    {0, 0}
};
ct_init(env);

You can pass 0 instead of env; if you do that, $CT_SCHED and $CT_RAND_REV will be looked up using getenv(). Similarly, if you do pass an env[], all variables not mentioned in it will be getenv()d.

The available environment variables and their meaning are discussed in the next section.

Environment variables

$CT_SCHED is the scheduler to use, and can be one of:

$CT_THREADS is the worker pool size (relevant for the parallel schedulers); the default is a thread per core.

$CT_VERBOSE: at 2, all indexes are printed; at 1, loops/invokes; at 0 (default), nothing is printed.

$CT_RAND_SEED: a seed for order-randomizing schedulers (shuffle & valgrind).

$CT_RAND_REV: if non-zero, order-randomizing schedulers will reverse their random index permutations. When this is useful is explained in the next section.

How race detection works

As mentioned above, there are two verification methods - a fast one and a thorough one.

The fast one is, run the program twice using two different serial schedules - a random schedule and then the same schedule, reversed:

env CT_SCHED=shuffle CT_RAND_REV=0 your-program your-arguments
env CT_SCHED=shuffle CT_RAND_REV=1 your-program your-arguments

Now compare the results of the two runs. Different results indicate a bug, because results should not be affected by scheduling order (in production, a parallel scheduler is used and it can result in things running in any of the two orders you just tried - and many other orders).

Using this method, you can run the program on many inputs (the program runs serially with the shuffle scheduler, so you can spawn a process per core to fully utilize machines used for testing). Many inputs and no result differences give you a rather high confidence that your program is correct.

However, this method has two drawbacks:

Because of these drawbacks, a second, slower and more thorough verification method is available:

env CT_SCHED=valgrind CT_RAND_REV=0 valgrind --tool=checkedthreads your-program your-arguments
env CT_SCHED=valgrind CT_RAND_REV=1 valgrind --tool=checkedthreads your-program your-arguments

(If Valgrind says "failed to start tool 'checkedthreads'", perhaps $VALGRIND_LIB should be set to point to the right place.)

This runs Valgrind with the checkedthreads tool, which monitors every memory access. When a thread accesses a location that another thread concurrently wrote, the tool prints the offending call stack:

checkedthreads: error - thread 56 accessed 0x7FF000340 [0x7FF000340,4], owned by 55
==2919==    at 0x40202C: std::_Function_handler<void (int), main::{lambda(int)#1}>::_M_invoke(std::_Any_data const&, int) (bug.cpp:16)
==2919==    by 0x403293: ct_valgrind_for_loop (valgrind_imp.c:62)
==2919==    by 0x4031C8: ct_valgrind_for (valgrind_imp.c:82)
==2919==    by 0x40283C: ct_for (ct_api.c:177)
==2919==    by 0x401E9D: main (bug.cpp:20)

Note that there aren't any actual threads - like the run under CT_SCHED=shuffle, this run is serial. Rather, the Valgrind tool maps ct_for loop indexes and ct_invoke function calls to thread IDs, such that those IDs can fit into a single byte. So "two threads accessing the same location" means that the location was accessed from two loop indexes/function calls that could run in parallel.

This second method is slower, but it doesn't miss any bugs that could ever occur with the given inputs - and it pinpoints the bugs. So it's a good idea to run the program under Valgrind on a few inputs in case plain shuffling misses bugs. And it's also useful to run the program under Valgrind on those inputs where shuffling discovered bugs - to pinpoint those bugs.

This is all you strictly need to know to verify your code. If you want more details - for instance, if you want to be convinced that the bug coverage is indeed as thorough as claimed above - you can read a detailed explanation here.

Note that there can be no deadlocks, because there are no locks (the only way to synchronize threads is forking - the spawning of loops - or joining, the termination of loops). So there's no need for deadlock detection tools.

Building and installing

It is perhaps possible to download binaries for your platform - something I always prefer to do... It is recommended that you read the following explanations about building from sources, and then decide which parts you want to build and which parts you prefer to download in binary form.

So, after cloning/downloading sources:

cd checkedthreads
./configure

configure is a Python script producing a single output, include/checkedthreads_config.h. You can edit this file manually; all it has is #defines telling which of the optional features are enabled. Note that make then checks what's #defined in config.h and adjusts compiler flags, etc.

./configure enables each of these features if it auto-detects that it's supported on your machine. You might then want to disable some features (even though they're supported on your machine).

At this point, you can build the libraries with make build, but plain make won't work yet because things must be manually configured to build the Valgrind tool. To do that, edit defaults.mk and set the following variables:

Note that $VALGRIND_LIB is used by the tests which make runs after building everything - make sets this environment variable for all the processes it spawns. You may need to explicitly set $VALGRIND_LIB when running valgrind --tool=checkedthreads outside make.

Also note that setting, in the shell running make, any of the environment variables mentioned in defaults.mk overrides their values (hence the name "defaults.mk").

With defaults.mk edited, you can build everything and run tests with:

make

make help will list the available make targets and options (such as make clean and make VERBOSE=1). Currently every build rebuilds everything from scratch (there's no dependency checking), which is tolerable at the current size of things.

After a successful build, you get libraries at lib/ as follows:

This variety of libraries should hopefully make it easy to link with checkedthreads in any scenario...

At this point you can "install" checkedthreads - that is, copy the files in lib/ to wherever you keep your libraries, and copy the files in include/ to wherever you keep your header files.

If you want to download binaries instead of building, the available binaries are listed here. You can choose to download just the Valgrind binaries (the slightly gnarlier thing to build) but to build the libraries from sources, for example.

Planned features

Planned features not yet avaialable:

Coding style

Support/contact

Yossi.Kreinin@gmail.com/http://yosefk.com. Feel free to contact if you run into any sort of problem using checkedthreads.