Awesome

Learning llvm

Notes and code samples for learning LLVM.

Intermediate Representation (IR)

The contents of the files in this section represent a module in llvm which is the top level structure.

Local values are simliar to registers in assembly language and start with %.

bitcode

The following section will work on an example that looks like this:

int something(int nr) {
  return nr + 2;
}

To generate the bitcode representation (bc extension):

$ /usr/bin/clang -emit-llvm -c -o simple.bc simple.c

The output simple.bc can be converted into assembly representation using:

$ llvm-dis simple.bc -o simple.ll

So lets take a look at the output:

; ModuleID = 'simple.c'
source_filename = "simple.c"
target datalayout = "e-m:e-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-unknown-linux-gnu"

; Function Attrs: noinline nounwind optnone uwtable
define dso_local i32 @something(i32) #0 {
  %2 = alloca i32, align 4
  store i32 %0, i32* %2, align 4
  %3 = load i32, i32* %2, align 4
  %4 = add nsw i32 %3, 2
  ret i32 %4
}

attributes #0 = { noinline nounwind optnone uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="x86-64" "target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }

!llvm.module.flags = !{!0}
!llvm.ident = !{!1}

!0 = !{i32 1, !"wchar_size", i32 4}
!1 = !{!"clang version 9.0.1 (Fedora 9.0.1-2.fc31)"}

The sections below will go through the above fields.

To generate the assemble representation:

$ /usr/bin/clang -emit-llvm -c -S -o simple.ll simple.c

This command will generate a file name simple.ll.

And a file in assembly representation can be converted into bitcode using:

$ llvm-as simple.ll -o simple.bc

Demangling

It can sometimes be difficult to read the IR when the front end uses name mangling. This can be demangeled using:

$ $ cat main.ll | llvm-cxxfilt

target datalayout

Contains information about the endienness and type sizes of the target triple This is used for some optimisation that require this information.

target datalayout: "e-m:e-i64:64-f80:128-n8:16:32:64-S128"

e stands for little endian. If it was big endian it would have been E. This is followed by types which have the format:

type:<size>:<abi>:<preferred>

m:e' stands for name mangling and especified thatELF` name mangling is used.

i64:64 means that 64 bit integer is 64 bits.

functions declarations

Are similar to c fuctions syntax:

define dso_local i32 @main(i32 %0, i8** %1) #0 {

}

dso_local means the compiler may assume that a function or variable marked as dso_local will resolve to a symbol within the same linkage unit. dso stands for dynamic shared object.

The following i32 is the return value, followed by the name of the function. Notice the @main specifies this as a global.

And main takes one i32 which will be stored in %0, and a pointer-to-pointer of i8 in %1.

Then we have the following in the body of the function:

; Function Attrs: noinline nounwind optnone uwtable                             
define dso_local i32 @something(i32) #0 {                                       
  %2 = alloca i32, align 4                                                      
  store i32 %0, i32* %2, align 4                                                
  %3 = load i32, i32* %2, align 4                                               
  %4 = add nsw i32 %3, 2                                                        
  ret i32 %4                                                                    
}

Notice #0 which specifies function attributes (listed further down in the file):

attributes #0 = { noinline nounwind optnone uwtable
"correctly-rounded-divide-sqrt-fp-math"="false" "disable-tail-calls"="false"
"less-precise-fpmad"="false" "min-legal-vector-width"="0"
"no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf"
"no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false"
"no-signed-zeros-fp-math"="false" "no-trapping-math"="false"
"stack-protector-buffer-size"="8" "target-cpu"="x86-64"
"target-features"="+cx8,+fxsr,+mmx,+sse,+sse2,+x87" "unsafe-fp-math"="false"
"use-soft-float"="false"
}

alloca will allocate (similar to man alloca?) on the stack. The type will determine the size of the allocation (sizeof(type)).

int something(int nr) {
  return nr + 2;
}

Will generate the following llvm IR:

; Function Attrs: noinline nounwind optnone uwtable                             
define dso_local i32 @something(i32) #0 {                                       
  %2 = alloca i32, align 4                                                      
  store i32 %0, i32* %2, align 4                                                
  %3 = load i32, i32* %2, align 4                                               
  %4 = add nsw i32 %3, 2                                                        
  ret i32 %4                                                                    
}

First space on the stack will be allocated (alloca) that can hold an i32. Next, we store the value in %0 which is the passed in nr, which is stored in the stack (allocated in the previous instruction) %2.

Next, the value stored in %2 will be loaded into %3, followed by calling add with produces a type of i32 and stores the result in %4. nsw stands for No Signed Wrap which means that if there signed overflow the result of this operation will be a poison value.

store instruction

Store takes two arguments, a value to store and an address where the value should be saved.

  store <type> <value>, <type>* <pointer>

For example:

  store i32 %0, i32* %2, align 4

So this is saying store the value that exists in register/local %0 which is of type i32 into %2 as a pointer to i32.

load instruction

  %3 = load i32, i32* %2, align 4

Load a i32 from %2 (which is a pointer) which I'm guessing is dereferenced?

arrays

Are declared like this:

%something = type [size/len x type]

For example:

%something = type [15 x i8]

types/object

Is used to represent a collection of data members in memory. Like a class or a struct in some programming languages.

For example, a type with two i32 values:

%something = type {i32, i32}

Another example with a field and a function pointer:

%"something_underscore" = type {i32, void (i32)* }

Taking a look at a more complicated example:

%"unwind::libunwind::_Unwind_Exception" = type { i64, void (i32, %"unwind::libunwind::_Unwind_Exception"*)*, [6 x i64] }

So the first part is just the name of the variable which needs quotes because of the characters used like ::, and we can see that this is a struct where the first member is a i64 value, the second is a function pointer which takes a i32, and then a pointer to a struct of this same time. The third argument is an array of size/len 6 and the types of the entries are i64.

Now, if we take a look in the rust source code we can find library/unwind/src/libunwind.rs:

#[repr(C)]
pub struct _Unwind_Exception {
    pub exception_class: _Unwind_Exception_Class,
    pub exception_cleanup: _Unwind_Exception_Cleanup_Fn,
    pub private: [_Unwind_Word; unwinder_private_data_size],
}

pub type _Unwind_Exception_Cleanup_Fn =
    extern "C" fn(unwind_code: _Unwind_Reason_Code, exception: *mut _Unwind_Exception);

So we can see that the _Unwind_Exception_Cleanup_Fn is the second parameter to the function pointer (second member of the struct). So that is only declaring the a type. And following that we have:

 %"unwind::libunwind::_Unwind_Context" = type { [0 x i8] }

Now, this is also a variable that is declared and the type is an array of size 0 and of type i8 (1 byte). I think this matched the definition of this enum found in library/unwind/src/libunwind.rs:

pub enum _Unwind_Context {}

And notice that this is an empty enum, so it does not have any values but it can have associated functions.

@vtable.0 = private unnamed_addr constant
     <{ i8*, [16 x i8], i8*, i8*, i8* }>
     <{ i8* bitcast (void (i64**)* @"core::ptr::drop_in_place$LT$std..rt..lang_start$LT$$LP$$RP$$GT$..$u7b$$u7b$closure$u7d$$u7d$$GT$::hfd0e41c7184d2d5c" to i8*), [16 x i8] c"\08\00\00\00\00\00\00\00\08\00\00\00\00\00\00\00", i8* bitcast (i32 (i64**)* @"core::ops::function::FnOnce::call_once$u7b$$u7b$vtable.shim$u7d$$u7d$::h6376c1087f8195c9" to i8*), i8* bitcast (i32 (i64**)* @"std::rt::lang_start::_$u7b$$u7b$closure$u7d$$u7d$::h4a18ec8e9cd2a4c5" to i8*), i8* bitcast (i32 (i64**)* @"std::rt::lang_start::_$u7b$$u7b$closure$u7d$$u7d$::h4a18ec8e9cd2a4c5" to i8*) }>, align 8

; Function Attrs: nonlazybind uwtable                                           
declare i32 @rust_eh_personality(i32,
    i32,
    i64,
    %"unwind::libunwind::_Unwind_Exception"*,
    %"unwind::libunwind::_Unwind_Context"*) unnamed_addr #1

Recall that a @ sign means that this is a global identifier and it is external unnamed_addr means that the address is not significant.

define internal void @std::sys_common::backtrace::__rust_begin_short_backtrace::hfb9be5a6e963bdda(void ()* %f)
   unnamed_addr #0 personality 
   i32 (i32, i32, i64, %"unwind::libunwind::_Unwind_Exception"*, %"unwind::libunwind::_Unwind_Context"*)* @rust_eh_personality {

#0 is the attribute index. personality constant allows a function to specify what function to use for exception handling.

Structures

Normal (non-packed/not padded)) struct:

%Foo = type {
  i32,           // no member name, uses index 0
  double }       // no member name, uses index 1

So notice that we use indexes instead of named members. But we don't use the indexes directly as in %someFoo.0, but instead we use getelementptr (GEP) (more on this instruction later).

Packed structs (padded):

%Foo = type <{
  i32,           // no member name, uses index 0
  double }>      // no member name, uses index 1

Casts

There are 9 different types of casts in llvm, Bitwise casts which are type casts, zero-extending casts, sign-extending casts, truncation-casts, floating-point extending casts, address-space casts (pointer casts), Pointer-to- integer casts, etc.

Bitwise casts (bitcast)

  Foo *foo = (Foo *) malloc(sizeof(Foo));

Would turn into the following LLVM IR:

  %1 = call i8* @malloc(i32 4)
  %foo = bitcast i8* %1 to %Foo*

Notice that we first have the call to @malloc and then the second instruction does the cast.

Types

i1 is a one-bit integer (which is used for booleans for example)
i32 is a 32-bit integer
iN is a N-bit integer

Pointer types:

[4 x i32]* A pointer to array of four i32 values
i32 addrspace(5)* A pointer to an i32 value that resides in address space 5.
i32 (i32*) * A pointer to a function that takes an i32*, returning an i32.
i8* can be used as pointer to void.

Metadata

One of the main motivations for adding metadata was for debugging information. Recall that ELF is the Excecutable and Linkable Format, and it contains all the info required to load an object file (very simplified but I've written about ELF before). It also contains debug information and the format most commonly used in ELF is DWARF (see the connection of names; elves and dwarfes).

In llvm IR you might find the following !3:

%_4 = load void ()*, void ()** %0, align 8, !nonnull !3, !noundef !3

The !3 refers to an index into the metadata section later inte file:

!0 = !{i32 7, !"PIC Level", i32 2}                                              
!1 = !{i32 7, !"PIE Level", i32 2}                                              
!2 = !{i32 2, !"RtLibUseGOT", i32 1}                                            
!3 = !{}                                                                        
!4 = !{i32 3310276}

Attributes

Are markers that define properties on functions, parameters and return values.

noalias

Means that memory access through a pointer is exclusive, so there are no other pointers to what this pointer points to.

nocapture

The pointer is not stored in a global or other memory location that outlives the function.

landingpad

This is an llvm instruction which specifies this basic block as a landingpad for exceptions. This corresponds to the code of catch region of a try-catch block. For example:

cleanup:                                          ; preds = %bb1
  %1 = landingpad { i8*, i32 }
          cleanup
  %2 = extractvalue { i8*, i32 } %1, 0
  %3 = extractvalue { i8*, i32 } %1, 1
  %4 = getelementptr inbounds { i8*, i32 }, { i8*, i32 }* %0, i32 0, i32 0
  store i8* %2, i8** %4, align 8
  %5 = getelementptr inbounds { i8*, i32 }, { i8*, i32 }* %0, i32 0, i32 1
  store i32 %3, i32* %5, align 8
  br label %bb3

In this case the optional cleanup flags indicates that this is a landing pad for a cleanup, which means that this landingpad should always be entered, whic I think can be used for running destructors/drop functions. In this case this landing pad can catch a struct with a i8* member and a 32 member.

Other possibilities for clauses, apart from cleanup are catch, and filter.

Invoke instruction

Lets take the following example of an invoke instruction and try to understand it;

%2 = invoke i32 @"std::rt::lang_start::_$u7b$$u7b$closure$u7d$$u7d$::h4a18ec8e9cd2a4c5"(i64** align 8 %_1)
          to label %bb1 unwind label %cleanup

i32 is the return type of the function being called.
@"std::rt::lang_start::_$u7b$$u7b$closure$u7d$$u7d$::h4a18ec8e9cd2a4c5" is the function pointer.
i64** align 8 %_1 are the arguments passed to the function.
to label %bb1 is the basic block to which control will be passed if the called function returns using ret.
unwind is a mandatory (non-optional) part of the invoke instruction
label %cleanup is the basic block to which control will be passed if the called functions returns using resume.

Resume instruction

TODO:

Basic blocks

These begin with an optional label, and end with a termination instruction like branch (br) or return (ret).

Branch (br)

The br instruction has two forms, one where there is a condition and one that is an unconditional branch.

br i1, label_if_true, label_if_false

br label_destination

i1 is a single bit integer and i32 would be an 32-bit integer.

For example using branch.c we have the following branch in main:

store i32 2, i32* %6, align 4
%8 = load i32, i32* %6, align 4
%9 = icmp eq i32 %8, 2
br i1 %9, label %10, label %11

10:                                               ; preds = %2
  call void @two()
  br label %12

11:                                               ; preds = %2
  call void @one()
  br label %12

12:
  ret i32 0
}

Note that the comment ; preds = %2 is saying that %2 is the predecessor for this basic block.

call

%7 = call i32 (i8*, ...) @printf(i8* getelementptr inbounds ([10 x i8], [10 x i8]* @.str.2, i64 0, i64 0))

The first argument is i32 which is the return value of printf. After that we have the signature of printf which is (i8*, ...) @prinf is pointer to the function which is followed by the arguments. Now, in this case the arguments are retrieved using getelementptr

int printf(const char *format, ...);

getelementptr

Is used to perform address calculations. It takes a a base type that it is going to operatate on. The second operand is the base pointer itself.

@.str.2 = private unnamed_addr constant [10 x i8] c"Bajja...\0A\00", align 1
...

getelementptr inbounds ([10 x i8], [10 x i8]* @.str.2, i64 0, i64 0)) 

base type: [10 x i8]
base ptr:  [10 x i8]* @.str.2
i64 0: base ptr offset, which is a pointer to [10 x i8]
i64 0: offset into above array

These last two indices are somewhat confusing. The first is an offset of the base type, and the second is a index into that type.

@.str.2 = private unnamed_addr constant [3 x i8] c"Ba\0A\00", align 1

getelementptr inbounds ([3 x i8], [3 x i8]* @.str.2, i64 0, i64 0)) 
                                          |            |     |
                       +------------------+            |     |
                       |-------------------------------|-----+
                       ↓                               |
                       +-+-+--+-+-+-+-+-+-+            |
                       |B|a|\n|x|x|x|y|y|y|            |
                       +-+-+--+-+-+-+-+-+-+            |
                       [  0  ][ 1  ][  2  ]            |
                          ^                            |
                          +----------------------------+

#.str.2 will be equal to 'B'

Notice that the type pointed to by the first offset is an array, in this case it is pointing to the same place as the base pointer. If we want to get a pointer to an element in this array we use the second index:

getelementptr inbounds ([3 x i8], [3 x i8]* @.str.2, i64 0, i64 1)) 
                                          |            |     |
                       +------------------+            |     |
                       | +-----------------------------|-----+
                       ↓ ↓                             |
                       +-+-+--+-+-+-+-+-+-+            |
                       |B|a|\n|x|x|x|y|y|y|            |
                       +-+-+--+-+-+-+-+-+-+            |
                       [  0  ][ 1  ][  2  ]            |
                          ^                            |
                          +----------------------------+
#.str.2 will be equal to 'a'

Notice what happens if we use a larger offset for the first index, this is like incrementing a pointer


getelementptr inbounds ([3 x i8], [3 x i8]* @.str.2, i64 1, i64 0)) 
                                          |            |     |
                       +------------------+            |     |
                       |      +------------------------|-----+
                       ↓      ↓                        |
                       +-+-+--+-+-+-+-+-+-+            |
                       |B|a|\n|x|x|x|y|y|y|            |
                       +-+-+--+-+-+-+-+-+-+            |
                       [  0  ][ 1  ][  2  ]            |
                                ^                      |
                                +----------------------+

The result is the address of the indexed element.

struct

%name = type {i8, [3 x i8], i32}

llc

I LLVM's static compiler which can take a llvm source, for example a .ll file and outputs assembly source for the target platform.

For example, lets take a simple c program and emit the llvm assembly for it:

$ /usr/bin/clang -emit-llvm -S -o print.ll print.c

This will generate a file name print.ll which can then be passed to llc:

$ /usr/bin/llc print.ll

Which will generate a print.s file which we can pass to an assembler:

$ /usr/bin/as print.s -o print.o

Which will generate an object file named print.o which then needs to be linked. Now, since print.c used the c library we need to make sure we link in the correct librarys which would normally be handled by the compiler tool chain, like clang or gcc:

$ /usr/bin/ld -e main -dynamic-linker /lib64/ld-linux-x86-64.so.2 \
       /lib64/crt1.o /lib64/crti.o -lc -arch x86_64 print.o  \
       /lib64/crtn.o -o print

For some background about the libraries specifed here please see learning-linux-kernel. If you want to see what gcc would use to link one can use the following command to have gcc show the link command options:

$ gcc -### -o print_out print.c

Now we can run our executable:

$ ./print
something....

One can use the opt command to generate some useful information in the assembly format, like named arguments and labels:

$ /usr/bin/opt -S -mem2reg -instnamer print.ll -o print_before_opt.ll

Link Time Optimizations (LTO)

As then name suggests these are optimizations that the linker can perform on all the object files that it is linking, so the linker puts all the objects files together and combines them into a single program.

So the linker would merge all the object files (which I think can also be llvm bit code files but not sure yet) into one big one and then do optimizations the one big merged file. This can allow for optimizations that where not discovered/possible in earlier stages but can also introduce scalablitlity or memory issues while linking.

So with LTO enabled, by using `-flto

$ make lto_a.o
/usr/bin/clang -flto -c -o lto_a.o lto_a.c

$ file lto_a.o
lto_a.o: LLVM IR bitcode

So we can indeed see that this is a LLVM IR bitcode file and we can inspect it using:

$ llvm-dis lto_a.o -o lto_a.ll

And we compile lto_main as a normal object file:

$ make lto_main.o 
/usr/bin/clang -c -o lto_main.o lto_main.c

And finally we compile the executable passing in the LLVM IR bitcode and the object code:

$ make lto_main
/usr/bin/clang -flto lto_a.o lto_main.o -o main

lld (LLVM portable)

lld is LLVM's linker and supports ELF, COFF, and Mach-O. Can be enabled by passing -fuse-ld=lld to clang/gcc:

-fuse-ld=lld

bfd linker (GNU)

Is written on top of the BFD library This is the traditional GNU binutils linker named ld.

-fuse-ld=bfd

gold linker (GNU)

Was written to remove the BFD library abstraction layer. This is the next/news GNU binutils linker named gold:

-fuse-ld=gold

LLVM Exception Handling (eh)

invoke

A call inside a try scope will will be replaces by the LLVM frontend with an invoke instruction. This instruction has two points of continuation, one for a normal successful call, and one if the call raises an exception. In the case of an exception the place where the code continues execution is called the landing pad. These are alternative function entry points where an exception structure ref and a type info index is passed as arguments.

invoke void @_Z9somethingi(i32 1)                                             
          to label %9 unwind label %10
      (if successful) (if throws)

(if successful)
9:                                                ; preds = %2                  
  br label %24                                                                  
                                                                                
(if throws)
10:                                               ; preds = %2                  
  %11 = landingpad { i8*, i32 }   // i8* = exception pointer part, i32=selector value part                                                
          catch i8* bitcast (i8** @_ZTIi to i8*)                                
  %12 = extractvalue { i8*, i32 } %11, 0                                        
  store i8* %12, i8** %6, align 8                                               
  %13 = extractvalue { i8*, i32 } %11, 1                                        
  store i32 %13, i32* %7, align 4                                               
  br label %14

LLVM tutorial example

This section follows the tutorial and contains notes around it.

Building:

$ make main
"clang++" "-g" -o main lang.o main.cc

Running the example:

$ ./main lang.example

libunwind

$ sudo yum install libunwind-devel

$ readelf --debug-dump=frames libunwind

poison value

TODO: