Awesome
Testing Coreutils
This manual contains detailed instructions on testing GNU's Coreutils. The testing pipeline depends on LLVM 11.
Step 1: Building the file system linker
We need a lightweight linker to link the Coreutils program with the uClibc library and the POSIX file system. The linker's implementation extracts all linking and code optimization related code from KLEE.
Install dependencies
Install the basic dependencies:
$ sudo apt-get install build-essential cmake file g++-multilib gcc-multilib git python3-pip
If you are using a recent Ubuntu distribution (e.g. 21.10), install the LLVM-11 packages provided from the package manager.
$ sudo apt-get install clang-11 llvm-11 llvm-11-dev llvm-11-tools
Otherwise, you can build LLVM-11 from source.
Build the preprocessor fs-linker
- First, clone the linker repo.
$ git clone https://github.com/Generative-Program-Analysis/fs-linker.git
$ cd fs-linker
- Configure the linker
$ mkdir build
$ cd build
$ cmake -DLLVM_CONFIG_BINARY=<ABSOLUTE_PATH_TO_LLVM_CONFIG_11_BINARY> \
-DCMAKE_INSTALL_PREFIX=<YOUR_INSTALL_PATH_PREFIX> \
-DCMAKE_BUILD_TYPE=<Release/Debug> ..
Note: if you do not wish to install the linker binary, please omit the -DCMAKE_INSTALL_PREFIX
option.
- Build the linker
$ make # or make install
The linker binary fs-linker
will be generated under build/bin
and <YOUR_INSTALL_PATH_PREFIX>/bin
(if user specifies -DCMAKE_INSTALL_PREFIX
).
Usage
The linker combines Coreutils programs with KLEE's POSIX file system (Step 2) and the uClibc library (Step 3) into a single LLVM IR program. The linker also performs source code optimization and transformation on the combined program.
Below are some basic options to use fs-linker
:
--posix-path=<PATH_TO_POSIX_ARCHIVE>
: this option should be the absolute path of the POSIX file system's archive. If you omit this option, the source program will not be linked with KLEE's POSIX file system but use GENSYM's internal file system.--uclibc-path=<PATH_TO_UCLIBC_ARCHIVE>
: this option should be the absolute path of the uClibc library's archive. This option is required if you are linking Coreutils programs.--switch-type=simple
: this option will lower the switch instructions to a chain of branch instructions.--optimize
: optimize the code using a series of pre-selected optimization passes.-o
: this option specifies the name of the output LLVM IR program.- To see all options, run
fs-linker --help
Sample usage
$ fs-linker --posix-path=${POSIX_SOURCE_DIR}/build/runtime/lib/libgensymRuntimePOSIX64.bca \
--uclibc-path=${UCLIBC_SOURCE_DIR}/lib/libc.a \
--switch-type=simple \
./echo.bc -o echo_linked.ll
The command above links input program echo.bc
with POSIX file system
and the uClibc library, lowers switch instructions
and produces the linked LLVM IR program into echo_linked.ll
.
Step 2: Building the POSIX file system
This step builds a modified version of KLEE's POSIX file system model in order to test Coreutils programs (or any programs using POSIX file system APIs) with both KLEE and GENSYM.
- First, clone the
posix-runtime
repository:
$ git clone https://github.com/Generative-Program-Analysis/posix-runtime.git
$ cd posix-runtime
- Configure the POSIX file system
$ mkdir build
$ cd build
$ cmake -DLLVM_CONFIG_BINARY=<ABSOLUTE_PATH_TO_LLVM_CONFIG_11_BINARY> \
-DLLVMCC=<ABSOLUTE_PATH_TO_CLANG_11_BINARY> \
-DLLVMCXX=<ABSOLUTE_PATH_TO_CLANG++_11_BINARY> \
-DCMAKE_INSTALL_PREFIX=<YOUR_INSTALL_PATH_PREFIX> \
-DGENSYM_HEADER_DIR=<DIR_TO_YOUR_GENSYMCLIENT_HEADER> \
-DKLEE_HEADER_DIR=<DIR_TO_YOUR_KLEE_HEADER> \
-DRUNTIME_CFLAGS="-O0 -fno-discard-value-names" ..
Note: if you do not wish to install the posix file system library, please omit the -DCMAKE_INSTALL_PREFIX
option.
Note: we will build the POSIX file system model for both KLEE and our engine GENSYM. So the user should provide the engine's external API header file, i.e. -DGENSYM_HEADER_DIR
(the directory where gensym_client.h
is located) and -DKLEE_HEADER_DIR
(the directory where klee.h
is located).
Note: we use -DRUNTIME_CFLAGS
to pass optimization flags to clang
, which produces bitcode files of source programs.
By default, we disable optimization by passing -O0 -fno-discard-value-names
.
If you wish to disable assertions, you can pass -DNDEBUG
in addition.
- Build the POSIX runtime
$ make # or make install
The archived posix-filesystem libkleeRuntimePOSIX64.bca
(for KLEE) and libgensymRuntimePOSIX64.bca
(for GENSYM) will be generated under build/runtime/lib
, and <YOUR_INSTALL_PATH_PREFIX>/lib
(if user specifies -DCMAKE_INSTALL_PREFIX
).
Step 3: Building the uClibc library
- First, clone the
klee-uclibc
repository:
$ git clone https://github.com/Generative-Program-Analysis/klee-uclibc.git
$ cd klee-uclibc
$ git checkout evaluation_uclibc_version
- Configure the klee-uClibc library:
$ ./configure --make-llvm-lib \
--with-cc <ABSOLUTE_PATH_TO_CLANG_11_BINARY> \
--with-llvm-config <ABSOLUTE_PATH_TO_LLVM_CONFIG_11_BINARY>
Note: The assertions in uClibc library are disabled by default, you may add --enable-assertions
option to enable assertions.
- Build the klee-uClibc:
$ make KLEE_CFLAGS="-fno-discard-value-names -mlong-double-64"
Note: we use KLEE_CFLAGS
to pass additional compiler flags in the end, -mlong-double-64
is used to disable fp80 datatype.
The generated uClibc library archive is lib/libc.a
.
Step 4: Building Coreutils programs
- First, clone the Coreutils repository:
$ git clone git://git.sv.gnu.org/coreutils
$ cd coreutils
$ git checkout tags/v8.32 -b <branch_name>
Note: we will be testing on Coreutils v8.32 release.
- Install and setup
wllvm
:
Coreutils programs need to be compiled into LLVM bitcode in order to be executed on KLEE and our engine.
We also need to install wllvm
to compile multiple modules of source code into a
single LLVM whole-program bitcode file:
$ sudo pip3 install wllvm
To successfully execute wllvm
, it is necessary to set several environment variables:
$ export LLVM_COMPILER=clang // required, we use llvm bitcode compiler
$ export LLVM_CC_NAME=<your-clang-11-name> // optional: can be set if your clang compiler is not called clang but something like clang-11
$ export LLVM_CXX_NAME=<your-clang++-11-name> // optional: same as above
$ export LLVM_LINK_NAME=<your-llvm-link-11-name> // optional: same as above
$ export LLVM_AR_NAME=<your-llvm-ar-11-name> // optional: same as above
$ export LLVM_COMPILER_PATH=<YOUR_LLVM_BINARY_PATH> // optional: can be set to the absolute path to the folder that contains your LLVM 11 binary. This prevents searching for the binary in your PATH environment variable.
- Build coreutils program
Please refer to README-prereq
under the root directory of coreutils to
install the dependencies required for building coreutils.
Note: For coreutils v8.32, newer Bison versions will cause errors, please install Bison version 3.5.1 before building.
To compile coreutils:
$ ./bootstrap
$ mkdir obj-llvm
$ cd obj-llvm
$ CC=wllvm ../configure \
--disable-nls \
CFLAGS="-O0 -Xclang -disable-O0-optnone -fno-discard-value-names -D__NO_STRING_INLINES -D_FORTIFY_SOURCE=0 -U__OPTIMIZE__"
$ make
Note: CFLAGS
specifies the flags for compiling Coreutils. We use -O0 -Xclang -disable-O0-optnone
to disable more aggressive optimizations. You can also use -O1 -Xclang -disable-llvm-passes
to enable some optimizations, and the generated bitcode under this option is more suited for optimization later (in the linking stage).
Then we need to extract the LLVM bitcode using extract-bc
(a wllvm
utility):
cd src
find . -executable -type f | xargs -I '{}' extract-bc '{}'
cd ..
Step 5: Testing
After the last step, compiled native binary files (e.g. echo
, cat
) and their LLVM bitcode files (e.g. echo.bc
, cat.bc
) are located under obl-llvm/src
.
For testing, we can create a separate folder under obj-llvm
:
mkdir playground
cd playground
cp ../src/*.bc ./
- Using KLEE
If we want to test the coreutils bitcode (for example echo) on KLEE.
First refer to KLEE's document to build KLEE.
Then using the fs-linker
program to produce a single LLVM IR program
containing the POSIX file system (with KLEE's external API) and the uClibc
library.
Using echo
as example:
fs-linker --posix-path=${dir_to_posix_archive}/libkleeRuntimePOSIX64.bca \
--uclibc-path=${dir_to_uclibc_folder}/lib/libc.a \
./echo.bc -o echo_klee.ll
Since we have already produced the program with the POSIX/uClibc library, we
do not need to specify these options when running KLEE.
We need to use --switch-type=simple
to lower switch
instructions into
branches, to make sure that both KLEE and GENSYM treats switch
in the same way:
klee --switch-type=simple echo_klee.ll --sym-stdout --sym-arg 8
KLEE should explore 4971 paths after running the above command. For more details on testing Coreutils programs with KLEE, please refer to KLEE's document.
- Using GENSYM with KLEE's POSIX model
To test Coreutils bitcode programs using GENSYM, we need to link the program with
the POSIX file system (with GENSYM's external API) and the uClibc library again using
the fs-linker
program:
fs-linker --posix-path=${dir_to_posix_archive}/libgensymRuntimePOSIX64.bca \
--uclibc-path=${dir_to_uclibc_folder}/lib/libc.a \
--switch-type=simple \
./echo.bc -o echo_linked.ll
Then we generate the symbolic-execution code for echo_linked.ll
under the interactive sbt
console:
sbt:SAI> runMain sai.gensym.RunGenSym <path-to-echo_linked.ll> --entrance=main --output=echo_linked_posix --use-argv
And after running make
under gs_gen/echo_linked_posix
, we can invoke the compiled executable file:
# current directory sai/dev-clean/gs_gen/echo_linked_posix
make
./echo_linked_posix --cons-indep --argv="./echo.bc --sym-stdout --sym-arg 8"
You should observe same path number 4971.
- Using GENSYM without POSIX
To test Coreutils bitcode files using GENSYM with its the internal file system, we only need to link the program with the uClibc library.
fs-linker --uclibc-path=${dir_to_uclibc_folder}/lib/libc.a \
--switch-type=simple \
./echo.bc -o echo_gensym_linked.ll
Then we generate the symbolic-execution code for echo_gensym_linked.ll
under the interactive sbt
console:
sbt:SAI> runMain sai.gensym.RunGenSym <path-to-echo_gensym_linked.ll> --entrance=main --output=echo_gensym_fs --use-argv
And after running make
under gs_gen/echo_gensym_fs
, we can invoke the compiled executable file:
# current directory sai/dev-clean/gs_gen/echo_gensym_fs
make
./echo_gensym_fs --cons-indep --argv="./echo.bc #{8}"
You should observe same path number 4971.
Measuring coverage with gcov
This feature is still under development for GENSYM. Please refer to Testing Coreutils about testing with gcov
in KLEE.