Home

Awesome

clang-extract

A tool to extract code content from source files using the clang and LLVM infrastructure.

Getting started

Compiling clang-extract

clang-extract requires clang, LLVM, libelf, zlib, meson and ninja in order to build. On openSUSE, you can install them by running:

$ sudo zypper install clang18 clang18-devel libclang-cpp18 \
       clang-tools libLLVM18 llvm18 llvm18-devel libelf-devel meson ninja \
       zlib-devel libzstd-devel

It's advised to use LLVM 18 and higher, since it's well tested. But there support for LLVM 16 and 17 as well, but you might find issues with it.

Once you have all those packages installed, you must setup the meson build system in order to compile. You can run either build-debug.sh for a debug build with no optimization and debug flags enabled for development, or a full optimized build with build-release.sh. Those scripts will create a build folder where you can cd into and invoke ninja for it to build. Example:

$ ./build-release.sh
$ cd build
$ ninja

Then the clang-extract binary will be available for you in the build folder.

Testing clang-extract

clang-extract has automated testing. Running the testsuite is as easy as running:

$ ninja test

inside the build directory. Test results are written into *.log files in the build folder.

Using clang-extract

Clang-extract currently only support C projects. Assuming clang-extract is compiled, it can be used to extract code content from projects using the following steps.

  1. Find, in the project, the function you want to extract, and which file it is in.
  2. Compile the project and grab the command line passed to the compiler.
  3. Replace gcc with clang-extract
  4. Pass -DCE_NO_EXTERNALIZATION -DCE_EXTRACT_FUNCTIONS=function -DCE_OUTPUT_FILE=/tmp/output.c to clang-extract.
  5. Done. In /tmp/output.c will have everything necessary for function to compile without any external dependencies.

Trivial example

Lets show how clang-extract works with a trivial example. Save the following code as a.c:

#include <stdlib.h>
#include <stdio.h>

void *unused_function(void)
{
  return malloc(1024);
}

int main(int argc, char *argv[])
{
  puts("Hello, world!");
  return 0;
}

compiling this code with clang would be:

$ clang a.c -O2 -o a

Note that the source code of a.c contain unused functions. In this case, clang-extract can be used to extract only the functions actually needed. In this case, extract the main function:

$ clang-extract a.c -O2 -o a -DCE_EXTRACT_FUNCTIONS=main -DCE_OUTPUT_FILE=out.c

on the output file out.c, you will see the following code:

/** clang-extract: from /usr/include/stdio.h:719:1  */
extern int puts (const char *__s);

/** clang-extract: from /tmp/a.c:9:1  */
int main(int argc, char *argv[])
{
  puts("Hello, world!");
  return 0;
}

Notice how any reference to unused_function is removed and all headers has been removed and replaced by a declaration of puts. The output code can be compiled with the same flags used to compile the original code:

$ clang out.c -O2 -o a

If you desire to keep the includes, see -DCE_KEEP_INCLUDES options and the Supported options chapter.

Symbol Externalization

Code transformation is very often needed when generating livepatches. For example, in livepatching if we need to call functions that are not exported in the program (i.e. private), we need to do a process called externalization.

Externalization works by redeclaring the original symbol as a pointer to its original symbol. By doing that we avoid linking issues that may come from using an private symbol.

Externalization is automatically enabled by default and can be disabled by providing the -DCE_NO_EXTERNALIZATION option.

Manual externalization

For example, with the following input:

#include <stdio.h>

int function(void)
{
  return 0;
}

int main(int argc, char *argv[])
{
  puts("Hello, world!");
  return function();
}

calling clang-extract with:

$ clang-extract a.c -DCE_EXTRACT_FUNCTIONS=main -DCE_OUTPUT_FILE=out.c -DCE_EXPORT_SYMBOLS=function

will externalize the function function, as the following output shows:

/** clang-extract: from /usr/include/stdio.h:719:1  */
extern int puts (const char *__s);

/** clang-extract: from /tmp/a.c:3:1  */
static int (*klpe_function)(void);

/** clang-extract: from /tmp/a.c:8:1  */
int main(int argc, char *argv[])
{
  puts("Hello, world!");
  return (*klpe_function)();
}

as one can see, the function was replaced by a pointer to a function klpe_function. On livepatching, this pointer to function is filled with the address of the original function, bypassing any kind of linking issues generated by symbol visibility.

Automatic externalization

clang-extract is able to automatically detect which symbols should be externalized if correct information is given to it. For that, three switches are available for the user to provide such information:

The precision of the automatic analysis depends of the amount of information the user provides. Clang-extract will in any case try to do its best to figure out what is the best option when certain information is not available.

Running on glibc project

Let's extract the function __libc_malloc from the glibc project. The steps are:

  1. Compile the glibc project until malloc.c is compiled: make -j8 | grep malloc.c
  2. Grab the command line:
gcc malloc.c -c -std=gnu11 -fgnu89-inline  -g -O2 -Wall -Wwrite-strings -Wundef -Werror -fmerge-all-constants -frounding-math -fno-stack-protector -fno-common -Wp,-U_FORTIFY_SOURCE -Wstrict-prototypes -Wold-style-definition -fmath-errno    -fPIE   -DMORECORE_CLEARS=2  -ftls-model=initial-exec     -I../include -I/home/giulianob/projects/glibc/build_glibc/malloc  -I/home/giulianob/projects/glibc/build_glibc  -I../sysdeps/unix/sysv/linux/x86_64/64  -I../sysdeps/unix/sysv/linux/x86_64  -I../sysdeps/unix/sysv/linux/x86/include -I../sysdeps/unix/sysv/linux/x86  -I../sysdeps/x86/nptl  -I../sysdeps/unix/sysv/linux/wordsize-64  -I../sysdeps/x86_64/nptl  -I../sysdeps/unix/sysv/linux/include -I../sysdeps/unix/sysv/linux  -I../sysdeps/nptl  -I../sysdeps/pthread  -I../sysdeps/gnu  -I../sysdeps/unix/inet  -I../sysdeps/unix/sysv  -I../sysdeps/unix/x86_64  -I../sysdeps/unix  -I../sysdeps/posix  -I../sysdeps/x86_64/64  -I../sysdeps/x86_64/fpu/multiarch  -I../sysdeps/x86_64/fpu  -I../sysdeps/x86/fpu  -I../sysdeps/x86_64/multiarch  -I../sysdeps/x86_64  -I../sysdeps/x86/include -I../sysdeps/x86  -I../sysdeps/ieee754/float128  -I../sysdeps/ieee754/ldbl-96/include -I../sysdeps/ieee754/ldbl-96  -I../sysdeps/ieee754/dbl-64  -I../sysdeps/ieee754/flt-32  -I../sysdeps/wordsize-64  -I../sysdeps/ieee754  -I../sysdeps/generic  -I.. -I../libio -I.  -D_LIBC_REENTRANT -include /home/giulianob/projects/glibc/build_glibc/libc-modules.h -DMODULE_NAME=libc -include ../include/libc-symbols.h  -DPIC  -DUSE_TCACHE=1   -DTOP_NAMESPACE=
glibc -o /home/giulianob/projects/glibc/build_glibc/malloc/malloc.o -MD -MP -MF /home/giulianob/projects/glibc/build_glibc/malloc/malloc.o.dt -MT /home/giulianob/projects/glibc/build_glibc/malloc/malloc.o
  1. Replace gcc with clang-extract and add the extra parameters (removed -Werror since clang treats some things as errors where gcc doesn't:
clang-extract malloc.c -c -std=gnu11 -fgnu89-inline  -g -O2 -Wall -Wwrite-strings -fmerge-all-constants -frounding-math -fno-stack-protector -fno-common -Wp,-U_FORTIFY_SOURCE -Wstrict-prototypes -Wold-style-definition -fmath-errno    -fPIE   -DMORECORE_CLEARS=2  -ftls-model=initial-exec     -I../include -I/home/giulianob/projects/glibc/build_glibc/malloc  -I/home/giulianob/projects/glibc/build_glibc  -I../sysdeps/unix/sysv/linux/x86_64/64  -I../sysdeps/unix/sysv/linux/x86_64  -I../sysdeps/unix/sysv/linux/x86/include -I../sysdeps/unix/sysv/linux/x86  -I../sysdeps/x86/nptl  -I../sysdeps/unix/sysv/linux/wordsize-64  -I../sysdeps/x86_64/nptl  -I../sysdeps/unix/sysv/linux/include -I../sysdeps/unix/sysv/linux  -I../sysdeps/nptl  -I../sysdeps/pthread  -I../sysdeps/gnu  -I../sysdeps/unix/inet  -I../sysdeps/unix/sysv  -I../sysdeps/unix/x86_64  -I../sysdeps/unix  -I../sysdeps/posix  -I../sysdeps/x86_64/64  -I../sysdeps/x86_64/fpu/multiarch  -I../sysdeps/x86_64/fpu  -I../sysdeps/x86/fpu  -I../sysdeps/x86_64/multiarch  -I../sysdeps/x86_64  -I../sysdeps/x86/include -I../sysdeps/x86  -I../sysdeps/ieee754/float128  -I../sysdeps/ieee754/ldbl-96/include -I../sysdeps/ieee754/ldbl-96  -I../sysdeps/ieee754/dbl-64  -I../sysdeps/ieee754/flt-32  -I../sysdeps/wordsize-64  -I../sysdeps/ieee754  -I../sysdeps/generic  -I.. -I../libio -I.  -D_LIBC_REENTRANT -include /home/giulianob/projects/glibc/build_glibc/libc-modules.h -DMODULE_NAME=libc -include ../include/libc-symbols.h  -DPIC  -DUSE_TCACHE=1   -DTOP_NAMESPACE=glibc -o /home/giulianob/projects/glibc/build_glibc/malloc/malloc.o -MD -MP -MF /home/giulianob/projects/glibc/build_glibc/malloc/malloc.o.dt -MT /home/giulianob/projects/glibc/build_glibc/malloc/malloc.o -DCE_NO_EXTERNALIZATION -DCE_OUTPUT_FILE=/tmp/out.c -DCE_EXTRACT_FUNCTIONS=__libc_malloc
  1. The output should be in /tmp/out.c and should be self-compilable. Check it by calling $ gcc -c /tmp/out.c. Here is the output for malloc: https://godbolt.org/z/6vrrTPoP9

Supported options

Clang-extract support many options which controls the output code:

For more switches, see

$ clang-extract --help

for more options.

Supported features

Currently we only support projects written in C. Clang-extract is extensively tested with the Linux kernel, glibc and openSSL sourcecode. C++ support is planned and clang-extract has some tests for it, but it can not handle libstdc++ headers yet.