Home

Awesome

A more powerful C/C++ macro preprocessor

The venerable C preprocessor (cpp) – the part of the compilation process that interprets hash-prefixed directives such as #include and #define, and substitutes for the macros defined by the latter – is undoubtedly one of the backbones of the C/C++ ecosystem. However, its macro functionality suffers from known limitations: macros may not call themselves recursively, meaningfully work with mutable state or introduce syntax that does not obey the shape of either single keywords or function calls. This severely limits its utility for metaprogramming, necessitating the proliferation of idiosyncratic boilerplate-generation tools of high complexity but limited scope such as Yacc or Qt's moc.

This project is an attempt to create a Turing-complete general-purpose preprocessor for C and C++ that is powerful enough to subsume all of the above and more: indeed, we shall aim to be able to build on top of either C or C++ in the way the latter was originally built upon the former, while seamlessly blending in with existing code as the C preprocessor does.

We draw significant inspiration from Rust's macro system, which appears to be the most ambitious such effort this side of LISP, without binding ourselves to its sometimes curious choice of syntax or its demand of hygiene (this is C, after all!). Since all respectable programming language projects are self-hosting and this is not one of them respectable programming language projects, the preprocessor itself is written in Haskell. The usual warnings about alpha-quality software apply, and all syntax and semantics is subject to change.

Try it out online here!

Example

struct {
    int value;
    LinkedList *next;
} LinkedList;

// define recursive macro to create a linked list
@define MakeList {
    ( {@^[,]$head, @^$tail} ) => (
        new LinkedList( {$head, MakeList {$tail}} )
    )
    ( {@^[,]$singleton} ) => (
        new LinkedList( {$singleton, NULL} )
    )
    ( {} ) => ( NULL )
}

// create a linked list with 5 elements
LinkedList *l = MakeList {1,2,3,4,5};

Usage

Using GNU make (preferred)

Make sure you have a recent version of ghc, mtl>=2.2.1 and parsec installed. The appropriate packages exist in Debian testing repositories as ghc, libghc-mtl-dev and libghc-parsec3-dev respectively; building under Windows or other unixoid OS families is currently untested. Run make in the root directory of the repository. This generates an executable file named macros. See INSTALL.windows.md for Windows installation instructions. Running

./macros <input file>

will emit the processed output to stdout. If any errors are encountered, they will be printed to stderr. For convenience, a wrapper script with-macro is included. Running

./with-macro g++ <input file> -O2 -Dotheroptions

(with the input filename being in the second position, i.e. immediately following the compiler executable!) is equivalent to running g++ -O2 -Dotheroptions on the output of the preprocessor on <input file>.

Using Haskell Stack

If you don't have access to a system package manager or it does not provide the necessary GHC packages, you can instead choose to use Haskell Stack. This may be the easiest way to build the project on non-Linux systems. After potentially installing Stack and running stack build, you should be able to execute the preprocessor by running

stack exec macros <input file>

which will preprocess the file and emit output to stdout as above.

To produce human-readable output

Use the additional flag -n after the input file, which suppresses emission of source-line hints (which greatly help with debugging, but get in the way of comprehension), and pipe through clang-format, like so:

./macros <input file> -n | clang-format

Introduction

The fundamental principle of macros is keyword-triggered pattern-matching and substitution on token streams. A typical macro definition takes the following form:

@define macroname {
    ( pattern one ) => ( printf("first pattern encountered") )
    ( pattern two ) => ( printf("second pattern encountered") )
    ( pattern three) => ( printf("third pattern encountered") )
    // ...
}

This defines a macro that is triggered by encountering the keyword macroname anywhere in the program text following the definition. If the keyword is encountered, the macro processor will proceed to try and match the token stream following it against the patterns – the token streams inside the left-hand side parentheses of a pair of the form (stream) => (stream) in the definition body in turn. For the first left-hand pattern that fully matches the token stream following the keyword, the corresponding right-hand stream will be substituted in for the keyword and the matching tokens. If no patterns match, an error is thrown.

So for instance,

macroname pattern two;
macroname pattern one two three four;
/* will become:
 * printf("second pattern encountered");
 * printf("first pattern encountered") two three four; */

The pattern of a macro can also capture tokens in compile-time variables, which are recognisable by the sigil $. The results of such a capture can be reused on the right-hand side. In the simplest case, we capture a single token:

@define greet {
	( $name ) => ( printf("Hello %s!", $name) )
}

greet "world";

Multiple tokens can be captured in one variable by using the modifier @^, which is followed by any number of stop patterns [token-stream]. If any stop pattern is encountered or the token stream ends, capture terminates.

@define bracketless_printf {
    ( $pattern, @^[;]$args ) => (
        printf($pattern, $args)
    )
}
bracketless_printf "%d %d",1,2;

List of directives

Macros and pattern matching

Variable capture

The following are only valid in patterns.

Variables

Token stream operations

File management