Awesome

atJIT: A just-in-time autotuning compiler for C++

About

atJIT is an early-phase experiment in online autotuning.

The code was originally based on the Easy::jit project.

Prerequisites

Before you can build atJIT, ensure that your system has these essentials:

a C++ compiler with sufficient C++17 support. This likely means GCC >= 7, or Clang >= 4, but slightly older versions may work.
cmake >= 3.5, and make
The test suite (the check build target) requires the following:
- Python 2.7
- The Python lit package, installable with pip install lit
- Valgrind, installable with sudo apt install valgrind on Ubuntu.

Then, do the following:

Step 1

Install a compatible version of Clang and LLVM version 8 or newer. You have two options for this:

Option 1 — Vanilla

Obtaining Pre-built LLVM

There is currently an issue with the version of Clang 8.x on LLVM's nightly APT repository, and the pre-built version of LLVM 8 on the download page lacks RTTI support. Thus, for Ubuntu you'll want to build LLVM from source as described next.

Building LLVM

We have automated this process with a script, which you can use in the following way:

mkdir llvm
./get-llvm.sh ./llvm

Where the first argument is an empty directory for building LLVM. The location of this LLVM installation will be ./llvm/install

Option 2 — Polly Knobs (depreciated)

In order for the tuner to make use of powerful loop transformations via Polly, you'll need to download and build an out-of-tree version of LLVM + Clang + Polly. Unfortunately, the maintenance of this out-of-tree version has not been kept up. If you would still like to try, you can follow the same instructions as in Option 1, but replace ./get-llvm.sh with ./get-llvm-with-polly.sh.

Step 2

Install Grand Central Dispatch, which on Ubuntu amounts to running:

sudo apt install libdispatch0 libdispatch-dev

Step 3

Obtain and build XGBoost by running the following command:

./xgboost/get.sh

Building atJIT

Once you have met the prerequisites, we can build atJIT. Starting from the root of the project, the general build steps are:

mkdir build install
cd build
cmake -DCMAKE_INSTALL_PREFIX=../install -DPOLLY_KNOBS=<ON/OFF> ..
cmake --build . --target install

By default, POLLY_KNOBS is set to OFF. If you were successful in building LLVM with Polly as described in the Polly Knobs section above, then you will want POLLY_KNOBS set to ON.

Once this completes, you can jump to the usage section. For special builds of atJIT, see below.

Build Options

If you are using a custom-built LLVM that is not installed system-wide, you'll need to add -DLLVM_ROOT=<absolute-path-to-LLVM-install> to the first CMake command above.

For example you could use this flag:

-DLLVM_ROOT=`pwd`/../llvm/install

To build the examples, install the opencv library, and add the flags -DATJIT_EXAMPLE=1 to the cmake command.

To enable benchmarking, first install the Google Benchmark framework. You can do this by running ../benchmark/setup.sh from the build directory, which will install Google Benchmark under <build dir>/benchmark/install. Then, you would add the following flags to cmake when configuring:

-DBENCHMARK=ON -DBENCHMARK_DIR=`pwd`/benchmark/install

After building, the benchmark executable will output as <build dir>/bin/atjit-benchmark. See here for instructions on using other tools in the Google Benchmark suite to help analyze the results, etc.

Regression Testing

The test suite (check target) can be run after the install target has been built:

cmake --build . --target install
cmake --build . --target check

None of the tests should have an unexpected failure/success.

Basic usage

Look in your install directory for the bin/atjitc executable, which is a thin wrapper around clang++ with the correct arguments to run the clang plugin and dynamically link in the atJIT runtime system. You can use atjitc as if it were clang++, as it forwards its arguments to clang++. Here's an example:

➤ install/bin/atjitc -Wall -O2 tests/simple/int_a.cpp -o int_a
➤ ./int_a
inc(4) is 5
inc(5) is 6
inc(6) is 7
inc(7) is 8

Using atJIT in my project

The C++ library interface to atJIT is quite minimal. To get started, construct the driver for the autotuner:

#include <tuner/driver.h>

... {
  tuner::ATDriver AT;
  // ...
}

A single driver can handle the tuning of multiple functions, each with their own unique partial argument applications. The driver only exposes one, do-it-all, variadic method reoptimize. Given a tuner AT, reoptimize has the following generic usage:

  /* (1) return type */ F = AT.reoptimize(
                           /* (2) function to reoptimize */
                           /* (3) arguments to the function */
                           /* (4) options for the tuner */
                          );

The return type of the function is some variant of easy::FunctionWrapper<> const&, which is a C++ function object that can be called like an ordinary function. The type depends on (2) and (3), and you can typically just write auto const& in its place.
The function to be optimized, which can be a template function if the type is specified.
A list of arguments that must match the arity of the original function. The following types of values are interpreted as arguments:

A placeholder (i.e., from std::placeholders) representing a standard, unfilled function parameter.
A runtime value. Providing a runtime value will allow the JIT compiler to specialize based on the actual, possibly dynamic, runtime value given to reoptimize.
A tuned parameter. This is a special value that represents constraints on the allowed arguments to the function, and leaves it up to the tuner to fill in an "optimal" value as a constant before JIT compilation. This can be used for algorithmic selection, among other things.

Here's an example:

using namespace std::placeholders;
using namespace tuned_param;

float fsub(float a, float b) { return a-b; }
void wait(int ms) { std::this_thread::sleep_for(std::chrono::milliseconds(ms)); }
int main () {
  tuner::ATDriver AT;
  // returns a function computing fsub(a, 1.0)
  easy::FunctionWrapper<float(float)> const& decrement = AT.reoptimize(fsub, _1, 1.0);

  // returns a function computing fsub(0.0, b)
  auto const& negate = AT.reoptimize(fsub, 0.0, _1);

  // returns a function with a fixed `wait` period in the range [1, 500]
  auto const& pause = AT.reoptimize(wait, IntRange(1, 500));

  printf("dec(5) == %f\n", decrement(5));
  printf("neg(3) == %f\n", negate(3));
  pause();
  // ...

The main option for the tuner is what algorithm to use during the search. If no option is specified, the tuner currently will not perform any search. To use the random search, we would specify tuner::AT_Random like so:

using namespace easy::options;

// returns a function equivalent to fsub(a, b)
auto const& fsubJIT = AT.reoptimize(fsub, _1, _2,
                           tuner_kind(tuner::AT_Random));

printf("fsubJIT(3, 2) == %f\n", fsubJIT(3.0, 2.0));

The current list of tuning options (namespaces omitted) are:

tuner_kind(x) — where x is one of AT_None, AT_Random, AT_Bayes, AT_Anneal.
pct_err(x) — where x is a double representing the precentage of tolerated time-measurement error during tuning. If x < 0 then the first measurement is always accepted. The default is currently 2.0.
blocking(x) — where x is a bool indicating whether reoptimize should wait on concurrent compile jobs when it is not required. The default is false.

Autotuning a Function

To actually drive the online autotuning process for some function F, you must repeatedly reoptimize F and call the newly returned version F' at least once. Ideally, you would ask the tuner for a reoptimized version of F before every call. For example:

for (int i = 0; i < 100; ++i) {
    auto const& tunedSub7 =
        AT.reoptimize(fsub, _1, 7.0, tuner_kind(tuner::AT_Random));

    printf("8 - 7 == %f\n", tunedSub7(8));
  }

Don't worry about calling reoptimize too often. Sometimes the tuner will JIT compile a new version, but often it will return a ready-to-go version that needs more runtime measurements to determine its quality.

See doc/readme/simple_at.cpp for the complete example we have walked through in this section.

License

See file LICENSE at the top-level directory of this project.

Acknowledgements

Special thanks to:

Hal Finkel & Michael Kruse (Argonne National Laboratory)
John Reppy (University of Chicago)
Serge Guelton & Juan Manuel Martinez Caamaño (originally developed Easy::jit)