Awesome
PINT
Pint is a PIN tool that exposes the PIN API to lua scripts. Pint runs on 64 bit Ubuntu and 32 bit Windows XP, it should run on 32 bit Ubuntu and 64 bit XP as well but this is yet untested.
PIN is a dynamic instrumentation engine developed by Intel. Basically, it is a JIT compiler for binaries. It will disassemble the binary, one basic block at a time, and recompile it with additional instructions inserted at arbitrary positions. Pint makes it possible to add lua code at this point.
Installation
Under Windows
Make sure you have VC++2010 installed. Install cygwin and install the following packages: ruby, patch, wget, unzip Add C:\cygwin\bin to the %PATH% environment variable. Open a VC++ console. Move to your preferred directory and run:
git clone "https://github.com/hexgolems/pint"
cd pint
ruby make.rb setup
The setup target will download pin & lua, apply patches where necessary and
build Pint. The final pintool is called runner.dll
.
You can then run your lua script with:
pin.exe -t runner.dll -s path/to/scrip -- binary_to_instrument.exe args for exe
Under Ubuntu
Make sure you use ruby1.9 and have git installed and then run:
git clone "https://github.com/hexgolems/pint"
cd pint
ruby make.rb setup
The setup target will download pin & lua, apply patches where necessary and
build Pint. The final pintool is called runner.so
.
You can then run your lua script with:
./pin/pin -injection child -t runner.so -s path/to/scrip -- /path/to/binary_to_instrument args for prog
Usage
Most of the original PIN functions are wrapped to lua. Names are changed
slightly: INS_IsSomeThing(INS a)
becomes Ins.is_some_thing(a)
alternatively
given an variable a
containing an INS
one can write a:is_some_thing()
.
Example: Extract Call Graphs
Use-case: You have a C++ program with a load of virtual function calls and you would like to extract all the targets of any dynamic jump (maybe you can use this to annotate you code in IDA or SchemDBG). Then you will have to register a callback for every newly assembled instruction. This callback will test if the new instruction is a call/jmp with an unknown target. If so, it will add another callback that logs the target of the call / jump.
To begin with, we include a small lib with helper functions:
require("src.lib")
Then we need a table that stores all from
, to
pairs. We create a new global variable for that:
jmps = {}
The next step is to create a function that is called once for every instruction that we recompile:
function callback_ins(ins)
--do something with the instruction
end
CB.instruction(callback_ins)
Now we need to create a callback function that is called if the given instruction is executed. It will take a from
-address, a target and a boolean indicating if the jump was taken. We add the form
, to
pair to our mapping.
function callback_jmp(from, to, was_taken)
if was_taken then
jmps[from] = jmps[from] or {}
jmps[from][to] = true
end
end
Now we need to add the newly created callback to all instructions that interests us. To do so we will test if the given instruction is a indirect call or jump (e.G. the target is determined at runtime). Then we add our callback to the instruction. To do so we first need to create a so called TypedCallback. A TypedCallback is an object that knows which values pin needs to pass to the callback.
jcall = TypedCallback.new("IARG_INST_PTR","IARG_BRANCH_TARGET_ADDR", "IARG_BRANCH_TAKEN", callback_jmp)
function callback_ins(ins)
if ins:is_indirect_branch_or_call() then
ins:add_callback( jcall,"BEFORE")
end
end
To wrap things up, we need to add a function that will be called upon the debugee exiting. This function will print the list pairs to stdout. Note that this will only properly work under Linux. Under windows the output will be printed into a log file since the process will close STDOUT/STDIN when exiting. Under Linux this was fixed by duping the fds. Under Windows we only reopen some log files as STDOUT/STDERR.
function at_exit(status)
for addr, targets in pairs(jmps) do
for target, _ in pairs(targets) do
print( hex(addr),"-->", hex(target))
end
end
end
Thus the entire code for extracting a list of all call/jmp targets is just 28 lines of lua. Additional samples can be found in src/tools
require("src.lib")
jmps = {}
function callback_jmp(from, to, was_taken)
if was_taken then
jmps[from] = jmps[from] or {}
jmps[from][to] = true
end
end
jcall = TypedCallback.new("IARG_INST_PTR","IARG_BRANCH_TARGET_ADDR", "IARG_BRANCH_TAKEN", callback_jmp)
function callback_ins(ins)
if ins:is_indirect_branch_or_call() then
ins:add_callback( jcall,"BEFORE")
end
end
CB.instruction(callback_ins)
function at_exit(status)
for addr, targets in pairs(jmps) do
for target, _ in pairs(targets) do
print( hex(addr),"-->", hex(target))
end
end
end
Example: Extract Used Strings
You have some binary and wonder "what strings does it use". Unfortunately, the strings
tool doesn't show any. Maybe the binary contains encrypted/ziped strings? This example shows a tool that will monitor byte size reads to the memory and log any string found. The setup is pretty much the same as in the previous example: There is a callback that gets called once for every new instruction. It checks if the instruction reads one byte from memory, if so an callback is added to this instruction. The callback will peek at the memory address read from, determine if there is a printable string at the given address and if so log the string and the current IP. During runtime, all newly encountered strings are printed to STDOUT, and a precise listing of (address, strings) strings pairs will be printed at_exit
require("src.lib")
print("This tool will take some time")
-- This function checks if all characters in the given string are printable
-- It returns true if so, false otherwise
function is_printable(str)
if not str then
return str
end
for c in str:gmatch"." do
b = c:byte(1)
if b <8 or b > 127 then
return false
end
end
return true
end
-- table containing string => address that this string was used from mappings
strings = {}
-- This function will use Helper.read_mem(addr, length) to read a string from memory.
-- It will look both forward and backward until it finds a non printable ASCII character
-- It will return nil if [addr..addr+6] do not contain a printable string
-- It returns a string of length < 128 otherwise
-- This string may start before the given address (e.G. address does not necessarily point to begin of the string)
function read_string_from(addr)
local res = ""
local str = Helper.read_mem(addr,6)
if str:len() < 6 then return nil end
while true do
if not is_printable(str) or res:len() > 64 then break end
res = res .. str
str = Helper.read_mem(addr+res:len(),1)
end
prev = ""
str = Helper.read_mem(addr-1,1)
while true do
if not is_printable(str) or prev:len() > 64 then break end
prev = str .. prev
str = Helper.read_mem(addr-prev:len()-1,1)
end
res = prev .. res
if not (res == "") then
return res
else
return nil
end
end
-- This function is called once every time a one byte read is performed
-- It will try to read a string from the accessed memory. It adds the (string, IP) pair to the global table of encountered strings.
-- It will print the IP and string if this string was encountered for the first time
function rip_string(ip, read_offset)
str = read_string_from(read_offset)
if str then
if not strings[str] then
print("-- at ",hex(ip),"used",str)
end
strings[str] = strings[str] or {}
strings[str][ip] = true
end
end
-- This is our TypedCallback. We need IP (IARG_INSTR_PTR) and the address where memory is read (IARG_MEMORYREAD_EA).
ins_call = TypedCallback.new("IARG_INST_PTR","IARG_MEMORYREAD_EA",rip_string)
-- This function is called every time the JIT Compiler encounters a new instruction the first time.
function callback_ins(ins)
if ins:is_memory_read() and ins:memory_read_size() == 1 then
addr = ins:address()
img = Img.find_by_address(addr)
-- check that ins belongs to the Main Image
if img and img:is_main_executable() then
-- only add the TypedCallback to rip_strings if
-- a) ins reads one byte from memory AND
-- b) ins is a instruction from the main image
ins:add_callback( ins_call,"BEFORE")
end
end
end
CB.instruction(callback_ins)
-- Print strings vs address Pairs at exit
function at_exit(status)
for string, addrs in pairs(strings) do
print(string, ":")
for addr,_ in pairs(addrs) do
print(" from:", "0x"..hex(addr))
end
end
end
print("end of lua script\n")
Additions To The API
Read Memory
Helper.read_mem(addr,size)
It will try to readsize
bytes beginning fromaddr
. It will return a string of length less then or equal tosize
. The string length will be shorter if some parts could not be read since no memory was mapped or other such errors. It will return a empty string if no bytes were read.
Instrumentation Callbacks
CB.image_load(func)
registersfunc
as a callback for every newly loaded image, func will be called with the newly loaded image as first argument.CB.image_unload(func)
registersfunc
as a callback for every unloaded image,func
will be called with the unloaded image as first argument.CB.instruction(func)
registersfunc
as a callback for newly JITed Instruction, func will be called with the instruction as first argument.CB.routine(func)
registersfunc
as a callback for newly JITed Routines,func
will be called with the routine as first argument.CB.trace(func)
registersfunc
as a callback for newly JITed traces (e.g. a single entry multiple exit basic block) ,func
will be called with the trace as first argument.
In lib.lua
hex(int)
returns a the hex string representing the given integerhex_str(str,spacer)
returns a string containing a hex dump ofstr
with bytes separated byspacer
. If no spacer is given, the bytes are concatenated without any separation.get_addr_repr(addr)
Returns a string describing the function containing addr asfilename.section.function
orfilename.section.hex(addr)
Renamed Functions
PIN_set_syntax_XED
is calledPin.set_syntax_xed
PIN_set_syntax_ATT
is calledPin.set_syntax_att
Contributing
- Fork it
- Create your feature branch (
git checkout -b my-new-feature
) - Commit your changes (
git commit -am 'Added some feature'
) - Push to the branch (
git push origin my-new-feature
) - Create new Pull Request