Home

Awesome

The Effective Type Sanitizer --- Dynamically Typed C/C++

EffectiveSan is a compiler tool that automatically inserts dynamic (i.e., runtime) type and bounds checking into C/C++ programs. The aim of EffectiveSan is to detect memory errors and type bugs in arbitrary C/C++ code.

Background

C and C++ are examples of statically typed programming languages, meaning that types are checked at compile time and not at runtime. Furthermore, C and C++ are weakly typed programming languages that allow the type system to be bypassed, including:

Weak static typing is primarily motivated by flexibility and efficiency (dynamic type and bounds checking is expensive). However, this also means that the programmer is responsible for ensuring that types are not violated at runtime. In practice, the programmer does not always get it right, and bugs relating to type violations are common and potentially serious. For example, consider the following "benign" code snippet:

    struct S {int a[3]; char *p;};
    struct T {float f; struct S s;};

    int get(struct T *t, int idx)
    {
        return t->s.a[idx];
    }

This snippet is well-typed according to standard C/C++ static type checking. However, at runtime, a lot can go wrong:

In practice, type and memory errors can be a lot more subtle, and are a common source of security vulnerabilities, program bugs, and other undefined behavior. For example, such errors are commonly exploited for control flow hijacking attacks, e.g., by overwriting the virtual function table pointer (vptr) of C++ objects. This can be achieved in several ways using the runtime errors described above, including:

Assuming an attacker can overwrite the vptr with a suitable value, control flow can then be hijacked using a call to a virtual function.

Beyond security, it is often useful to detect and eliminate deliberate type-based undefined behavior---so-called type abuse---since it can harm code quality/portability. For example, one idiom we have observed in the wild is to implement C++-style inheritance using structures with overlapping members, e.g.:

    struct Base { int x; float y; };
    struct Derived { int x; float y; char z; };

We have observed such idioms in SPEC2006's perlbench and povray benchmarks (despite povray being a C++ program). Such idioms may violate the compiler's Type Base Aliasing Analysis (TBAA) assumptions, causing code to be miscompiled, else requiring special compiler options such as -fno-strict-aliasing. Type abuse may also mask dangerous (security critical) type errors as well.

Dynamic Typing for C and C++

The Effective Type Sanitizer (EffectiveSan) is a tool for instrumenting C/C++ programs with dynamic type checks---effectively transforming C/C++ into a dynamically typed programming languages. The instrumented dynamic type check compares the runtime type of an object (a.k.a. the effective type using C standard terminology) against the static type declared in the code. An error will be logged if there is a mismatch.

For example, EffectiveSan will instrument the get() function by adding type and bounds checks:

    int get(struct T *t, int idx)
    {
        BOUNDS b = type_check(t, struct T);     // Inserted type check
        b = bounds_narrow(b, t->s.a);           // Inserted bounds narrow
        int *tmp = &t->s.a;
        bounds_check(tmp, b);                   // Inserted bounds check
        return tmp[idx];
    }

Here, three additional operations are inserted:

If either type_check or bounds_check fails then an error will be logged. By default, all logged errors are printed to stderr when the program exits (EffectiveSan does not stop execution, although this is configurable).

The inserted instrumentation can detect type and memory errors described above. For example, consider the type error:

    S *s = (S *)malloc(sizeof(struct S));
    get((T *)s, 2);

Then running this program results in a type error:

    TYPE ERROR:
            pointer  = 0x30a12d3740 (heap)
            expected = struct T
            actual   = struct S { int32_t[3]; /*0..12*/ int8_t *; /*16..24*/ } [+0]
                       >int32_t [+0]

Here:

Next, consider the use-after-free error:

    free(t);
    get(t, 2);

EffectiveSan considers "free" objects to have a special "<free memory>" type. This allows use-after-free errors to be detected as a special kind of type error:

    USE-AFTER-FREE ERROR:
            pointer  = 0x4034b5bfd0 (heap)
            expected = struct T
            actual   = <free memory> [+0]

Finally, consider the (sub-)object bounds error:

    get(t, 4)

EffectiveSan uses dynamic typing and bounds narrowing to detect sub-object bounds errors:

    SUBOBJECT BOUNDS ERROR:
            pointer  = 0x405efddc68 (heap)
            type     = struct T { float32_t; /*0..4*/ struct S; /*8..32*/ } [+8..+20]
                       >struct S { int32_t[3]; /*0..12*/ int8_t *; /*16..24*/ } [+0..+12]
                       >>int32_t [+0..+12]
            bounds   = 0..12 (8..20)
            access   = 16..20 (24..28)

Here:

Using the above instrumentation, EffectiveSan can detect multiple classes of errors, including type confusion, object bounds errors, sub-object bounds errors, and use-after-free errors---all using the same underlying methodology.

How EffectiveSan Works

We give a very brief overview on how some of the internals of EffectiveSan work. For more detailed information, please see our paper (see further reading below) or study the source code.

EffectiveSan consists of three main components:

  1. A "modified" clang front-end that preserves high-level C/C++ type information as LLVM IR meta-data.
  2. A LLVM-instrumentation pass that inserts type/bounds checks, as well as replaces memory allocation with the "typed" version.
  3. A run-time support library that implements the meta data tracking scheme.

The runtime system for tracking dynamic types is the main innovation. The basic idea is to build on top of low fat pointers which are a system for tracking the bounds (size and base) of allocated objects, which was originally developed for bounds checking. Instead, EffectiveSan uses low fat pointers to store type meta data at the base of allocated objects. For example, consider the memory allocation:

    q = (struct T *)malloc(sizeof(struct T));

Then, under EffectiveSan, the memory layout will be as follows:

<p align="center"> <img src="images/layout.png" width="50%" alt="EffectiveSan object layout."> </p>

Here (META) is the EffectiveSan object meta data comprising (1) a reference to the type meta data for (struct T), and (2) the size (array length) of the allocation. Note that META is stored in the memory immediately before the allocated object. The memory layout of the object itself is unchanged, which is critical for compatibility. EffectiveSan similarly transforms stack and global objects.

The combined META and object are allocated using the low-fat pointer allocator. This means that any interior pointer, e.g., (p) can be efficiently mapped to the base address (base(p)) which contains META. The EffectiveSan runtime therefore has access to the following information:

  1. The dynamic type of (q) (from META);
  2. The static type of (p) (from the source code); and
  3. The offset between (p) and (q) (calculated).

The EffectiveSan runtime maps the (dynamic-type, static-type, offset) triple to (sub-)object bounds (relative to p) using a layout hash table stored in the type meta data. For example, the layout hash table for (struct T) is as follows:

    (T, T, 0) ---> -oo..oo     (T, float, 0) ---> 0..4     (T, S, 4) ---> 0..20
    (T, int, 4) ---> 0..12     (T, int, 8) ---> -4..8      (T, int, 12) ---> −8..4
    (T, char *, 16) ---> 0..8

Each entry represents a (sub-)object for type (struct T).

For example, if p has the static type (int *), the corresponding triple will be (T, int, 12). This triple corresponds to the sub-object (T.s.a), and the sub-object bounds (in bytes) is therefore p-8..p+4.

In contrast, if p has the static type (float *), the corresponding triple will be (T, float, 12). This triple has no entry (i.e., a hash table miss), meaning that a type error is detected.

Benchmarks

We compare EffectiveSan against an uninstrumented baseline. For these tests, we use the standard SPEC2006 benchmark suite:

<p align="center"> <img src="images/results.png" width="75%" alt="EffectiveSan SPEC2006 timings."> </p>

We also compare two different versions of EffectiveSan:

Both versions use the EFFECTIVE_SINGLETHREADED runtime option (SPEC2006 is single-threaded), and the "no logging" version uses the additional EFFECTIVE_NOLOG runtime option. The former represents "normal" use, which includes both instrumentation and logging overheads, while the latter is a better estimate of the instrumentation overhead only. Some benchmarks, notably perlbench and gcc, generate many errors meaning the logging overhead is more significant.

Overall we see that EffectiveSan (logging) is 3.53x the uninstrumented baseline, and EffectiveSan (no logging) is slightly faster at 3.41x the baseline. For reference, a typical bounds checking sanitizer (e.g., AddressSanitizer), that does no type nor sub-object bounds checking, has a typical overhead of 1.5-2.0x. EffectiveSan is intended to be a trade-off: although it is generally more expensive it is also more comprehensive in the class and number of errors detected.

EffectiveSan exhibits a low memory overhead of 1.12x for SPEC2006.

EffectiveSan detects may type, (sub-)object bounds, and use-after-free errors in the SPEC2006 benchmarks, some known and some new. These include:

The type errors are summarized in the paper (see further reading below). Most type errors appear to be related to type abuse, i.e., deliberate type errors introduced by the programmer. Although sometimes it is hard to be sure without knowing the programmer's intent. Some examples of type errors include:

Further Reading

For more detailed information EffectiveSan, please see our PLDI'2018 paper:

EffectiveSan is built on top of our earlier work on low fat pointers. More information can be found here:

Usage

Installing

EffectiveSan releases can be downloaded from here: https://github.com/GJDuck/EffectiveSan/releases

EffectiveSan is implemented as a modified version of clang/LLVM for the x86_64/Linux architecture. To use, simply extract the distribution into your desired location, e.g.:

    $ tar xvfJ effectivesan-VERSION.tar.xz

No other special installation steps are required.

Running

To instrument a program using EffectiveSan, simply compile using the special modified clang/clang++ and the -fsanitize=effective -O2 options:

    $ effectivesan-VERSION/bin/clang -fsanitize=effective -O2 program.c
    $ effectivesan-VERSION/bin/clang++ -fsanitize=effective -O2 program.cpp

Note that EffectiveSan assumes -O2 optimization level in order to work correctly. Next, the resulting executable can be run as normal:

    $ ./a.out

A logged error messages should be printed to stderr when the program exits.

Note that it is common for the same type or bounds error to occur multiple times during program execution. By default, EffectiveSan will "group" similar errors, so as not to make the error log too long. Grouping behavior can be changed or disabled using the EFFECTIVE_VERBOSITY runtime option (see options below).

Options

EffectiveSan supports several compiler options listed below. To pass options to EffectiveSan, use the -mllvm clang option, e.g., -mllvm -effective-no-globals:

In addition to the compiler time options, EffectiveSan also supports several runtime options that can be set via environment variables:

Building

It is also possible to build EffectiveSan from source. To do so, simply extract the source distribution and run the build.sh script:

    $ ./build.sh

The entire build process should be automatic.

To also build the binary distribution, use the following command instead:

    $ ./build.sh release

FireFox

It is possible to build a version of the FireFox web browser using EffectiveSan. For this, run the following commands:

    $ cd firefox
    $ ./setup-firefox-build.sh

If this succeeds, then:

    $ cd firefox-52.2.1esr
    $ ./mach build
    $ ./mach run

Notes:

Features

In addition to the core features (type/bounds/UAF-checking) highlighted above, EffectiveSan also supports the following:

Limitations

EffectiveSan is a complex sanitizer and therefore has some limitations:

The EffectiveSan option -effective-warnings will enable warnings about meta data limitations, if any. The following explains the meaning of each possible warning message:

We have mainly focused on minimizing such warnings for the SPEC2006 benchmarks. Other software may yield different results.

FAQ

Q: Why do we need EffectiveSan when we already have AddressSanitizer?

AddressSanitizer is a popular tool for detecting memory errors such as bounds overflows and use-after-free errors. EffectiveSan can also detect these kinds of errors, as well as other classes of error that AddressSanitizer cannot detect, such as:

In addition to AddressSanitizer, there are a whole bunch of dynamic memory and type error detection tools. The main difference is that EffectiveSan attempts to be as comprehensive as possible, i.e., detecting all type/memory errors using a single underlying methodology.

Q: Why does EffectiveSan report so many type errors? Most type errors are harmless.

EffectiveSan reports anything deemed to be a type violation. Programmers sometimes introduce "deliberate" type errors for various reasons, such as convenience, efficiency, "cool hacks", etc. This is so-called "type abuse". EffectiveSan does not (and cannot) distinguish between accidental and deliberate type errors.

Although type abuse may seem harmless, it is nevertheless undefined behavior under the C/C++ standards. One common problem is the interaction of type abuse and Type Based Alias Analysis (TBAA), which can result in the program being "mis-compiled".

Q: Why does EffectiveSan report type errors for std::map and std::set?

For example, one typical type error is something like the following:

    TYPE ERROR:
            pointer  = 0x1a00d02adb0 (heap)
            expected = struct std::_Rb_tree_node<std::pair<xalanc_1_8::XalanQNameByReference const, xalanc_1_8::ElemTemplate const*> >
            actual   = ...
                       ...
                       >>>>>struct std::_Rb_tree_node_base { int32_t; /*0..4*/ struct std::_Rb_tree_node_base *; /*8..16*/ struct std::_Rb_tree_node_base *; /*16..24*/ struct std::_Rb_tree_node_base *; /*24..32*/ } [+0]

These errors are caused by a combination of (1) bad casts in C++ standard library header files, and (2) EffectiveSan may report type errors on never-executed paths. For example, the end() method contains a bad cast from a _Rb_tree_node_base to a _Rb_tree_node (a.k.a. _Link_type):

    _Link_type
    _M_end() _GLIBCXX_NOEXCEPT
    { return reinterpret_cast<_Link_type>(&this->_M_impl._M_header); }

This appears to be "type abuse" rather than an actual bug. These bad casts are also detected by other dynamic type checking tools such as CaVer.

Follow-up Work

Versions

EffectiveSan should be considered alpha quality software. Since EffectiveSan is relatively complex, there are probably a lot of bugs. Please report issues here: https://github.com/GJDuck/EffectiveSan/issues

The released version of EffectiveSan has been improved since the prototype evaluated in the paper. Generally, the released version is faster (3.41x alpha overhead versus 3.88x for the prototype), contains fewer bugs, and offers more comprehensive error detection. The featured SPEC2006 issues reported in the paper text should be reproducible using the released alpha version. The issue count (Figure 7) should be generally consistent with the released alpha version, although the exact counts have changed due to bug fixes and different C++ standard library versions.

Thanks

This research was partially supported by a grant from the National Research Foundation, Prime Minister's Office, Singapore under its National Cybersecurity R&D Program (TSUNAMi project, No. NRF2014NCR-NCR001-21) and administered by the National Cybersecurity R&D Directorate.