Awesome
SeaDsa: A Points-to Analysis for Verification of Low-level C/C++
<a href="https://github.com/seahorn/sea-dsa/actions"><img src="https://github.com/seahorn/sea-dsa/workflows/CI/badge.svg" title="Ubuntu 22.04 LTS 64bit, clang++-14"/></a>
SeaDsa
is a context-, field-, and array-sensitive unification-based
points-to analysis for LLVM bitcode inspired
by DSA.
SeaDsa
is an order of magnitude more scalable and precise than Dsa
and a previous implementation of SeaDsa
thanks to improved handling
of context sensitivity, addition of partial flow-sensitivity, and type-awareness.
Although SeaDsa
can analyze arbitrary LLVM bitcode, it has been
tailored for use in program verification of C/C++ programs. It can be
used as a stand-alone tool or together with
the SeaHorn
verification framework and its analyses.
This branch supports LLVM 14.
Requirements
SeaDsa
is written in C++ and uses the Boost library. The main requirements
are:
- C++ compiler supporting c++14
- Boost >= 1.65
- LLVM 14
To run tests, install the following packages:
sudo pip install lit OutputCheck
sudo easy_install networkx
sudo apt-get install libgraphviz-dev
sudo easy_install pygraphviz
Project Structure
- The main Points-To Graph data structures,
Graph
,Cell
, andNode
, are defined ininclude/Graph.hh
andsrc/Graph.cc
. - The Local analysis is in
include/Local.hh
andsrc/DsaLocal.cc
. - The Bottom-Up analysis is in
include/BottomUp.hh
andsrc/DsaBottomUp.cc
. - The Top-Down analysis is in
include/TopDown.hh
andsrc/DsaTopDown.cc
. - The interprocedural node cloner is in
include/Cloner.hh
andsrc/Clonner.cc
. - Type handling code is in
include/FieldType.hh
,include/TypeUtils.hh
,src/FieldType.cc
, andsrc/TypeUtils.cc
. - The allocator function discovery is in
include/AllocWrapInfo.hh
andsrc/AllocWrapInfo.cc
.
Compilation and Usage
Program Verification benchmarks
Instructions on running program verification benchmarks, together with recipes for building real-world projects and our results, can be found in tea-dsa-extras.
Integration in other C++ projects (for users)
SeaDsa
contains two directories: include
and src
. Since SeaDsa
analyzes LLVM bitcode, LLVM header files and libraries must be
accessible when building with SeaDsa
.
If your project uses cmake
then you just need to add in your
project's CMakeLists.txt
:
include_directories(seadsa/include)
add_subdirectory(seadsa)
Standalone (for developers)
If you already installed llvm-14
on your machine:
mkdir build && cd build
cmake -DCMAKE_INSTALL_PREFIX=run -DLLVM_DIR=__here_llvm-14__/share/llvm/cmake ..
cmake --build . --target install
Otherwise:
mkdir build && cd build
cmake -DCMAKE_INSTALL_PREFIX=run ..
cmake --build . --target install
To run tests:
cmake --build . --target test-sea-dsa
Visualizing Memory Graphs and Complete Call Graphs
Consider a C program called tests/c/simple.c
:
#include <stdlib.h>
typedef struct S {
int** x;
int** y;
} S;
int g;
int main(int argc, char** argv){
S s1, s2;
int* p1 = (int*) malloc(sizeof(int));
int* q1 = (int*) malloc(sizeof(int));
s1.x = &p1;
s1.y = &q1;
*(s1.x) = &g;
return 0;
}
-
Generate bitcode:
clang -O0 -c -emit-llvm -S tests/c/simple.c -o simple.ll
The option -O0
is used to disable clang optimizations. In general,
it is a good idea to enable clang optimizations. However, for trivial
examples like simple.c
, clang simplifies too much so nothing useful
would be observed. The options -c -emit-llvm -S
generate bitcode in
human-readable format.
-
Run
sea-dsa
on the bitcode and print memory graphs to dot format:seadsa -sea-dsa=butd-cs -sea-dsa-type-aware -sea-dsa-dot simple.ll
The options -sea-dsa=butd-cs -sea-dsa-type-aware
enable the analysis
implemented in our FMCAD'19 paper (see References). This command will
generate a FUN.mem.dot
file for each function FUN
in the bitcode
program. In our case, the only function is main
and thus, there is
one file named main.mem.dot
. The file is generated in the current
directory. If you want to store the .dot
files in a different
directory DIR
then add the option -sea-dsa-dot-outdir=DIR
-
Visualize
main.mem.dot
by transforming it to apdf
file:dot -Tpdf main.mem.dot -o main.mem.pdf open main.mem.pdf // replace with you favorite pdf viewer
In our memory model, a pointer value is represented by a cell which is a pair of a memory object and offset. Memory objects are represented as nodes in the memory graph. Edges are between cells.
Each node field represents a cell (i.e., an offset in the node). For
instance, the node fields <0,i32**>
and <8,i32**>
pointed by %6
and %15
, respectively are two different cells from the same memory
object. The field <8,i32**>
represents the cell at offset 8 in the
corresponding memory object and its type is i32**
. Black edges
represent points-to relationships between cells. They are labeled with
a number that represents the offset in the destination node. Blue
edges connect formal parameters of the function with a cell. Purple
edges connect LLVM pointer variables with cells. Nodes can have
markers such as S
(stack allocated memory), H
(heap allocate
memory), M
(modified memory), R
(read memory), E
(externally
allocated memory), etc. If a node is red then it means that the
analysis lost field sensitivity for that node. The label {void}
is
used to denote that the node has been allocated but it has not been
used by the program.
sea-dsa
can also resolve indirect calls. An indirect call is a
call where the callee is not known statically. sea-dsa
identifies
all possible callees of an indirect call and generates a LLVM call
graph as output.
Consider this example in tests/c/complete_callgraph_5.c
:
struct class_t;
typedef int (*FN_PTR)(struct class_t *, int);
typedef struct class_t {
FN_PTR m_foo;
FN_PTR m_bar;
} class_t;
int foo(class_t *self, int x)
{
if (x > 10) {
return self->m_bar(self, x + 1);
} else
return x;
}
int bar (class_t *self, int y) {
if (y < 100) {
return y + self->m_foo(self, 10);
} else
return y - 5;
}
int main(void) {
class_t obj;
obj.m_foo = &foo;
obj.m_bar = &bar;
int res;
res = obj.m_foo(&obj, 42);
return 0;
}
Type the commands:
clang -c -emit-llvm -S tests/c/complete_callgraph_5.c -o ex.ll
sea-dsa --sea-dsa-callgraph-dot ex.ll
It generates a .dot
file called callgraph.dot
in the current
directory. Again, the .dot
file can be converted to a .pdf
file
and opened with the commands:
dot -Tpdf callgraph.dot -o callgraph.pdf
open callgraph.pdf
sea-dsa
can also print some statistics about the call graph
resolution process (note that you need to call clang
with -g
to
print file,line, and column information):
sea-dsa --sea-dsa-callgraph-stats ex.ll
=== Sea-Dsa CallGraph Statistics ===
** Total number of indirect calls 0
** Total number of resolved indirect calls 3
%16 = call i32 %12(%struct.class_t* %13, i32 %15) at tests/c/complete_callgraph_5.c:14:12
RESOLVED
Callees:
i32 bar(%struct.class_t*,i32)
%15 = call i32 %13(%struct.class_t* %14, i32 10) at tests/c/complete_callgraph_5.c:23:16
RESOLVED
Callees:
i32 foo(%struct.class_t*,i32)
%11 = call i32 %10(%struct.class_t* %2, i32 42) at tests/c/complete_callgraph_5.c:36:9
RESOLVED
Callees:
i32 foo(%struct.class_t*,i32)
Dealing with C/C++ library and external calls
The pointer semantics of external calls can be defined by writing a
wrapper that calls any of these functions defined in
seadsa/seadsa.h
:
extern void seadsa_alias(const void *p, ...);
extern void seadsa_collapse(const void *p);
extern void seadsa_mk_seq(const void *p, unsigned sz);
seadsa_alias
unifies all argument's cells, seadsa_collapse
tells
sea-dsa
to collapse (i.e., loss of field-sensitivity) the cell
pointed by p
, and seadsa_mk_seq
tells sea-dsa
to mark as
sequence the node pointed by p
with size sz
.
For instance, consider an external call foo
defined as follows:
extern void* foo(const void*p1, void *p2, void *p3);
Suppose that the returned pointer should be unified to p2
but not to
p1
. In addition, we would like to collapse the cell corresponding to
p3
. Then, we can replace the above prototype of foo
with the
following definition:
#include "seadsa/seadsa.h"
void* foo(const void*p1, void *p2, void*p3) {
void* r = seadsa_new();
seadsa_alias(r,p2);
seadsa_collapse(p3);
return r;
}