Home

Awesome

hal-fuzz: An HLE-based fuzzer for blob firmware

hal-fuzz is the sleeker, faster, fuzzing-oriented version of HALucinator. It was developed as part of our paper "HALucinator: Firmware Re-Hosting through Abstraction Layer Emulation" at USENIX 2020.

It was also used by the Shellphish hacking team to win the 2019 CSAW Embedded Security Challenge(https://github.com/TrustworthyComputing/csaw_esc_2019), by leveraging its rehosting, fuzzing, and debugging capabilities. Check out a video of hal-fuzz grilling up a challenge automatically here: https://drive.google.com/file/d/1m4VzTQUBMb1xOZN9GmWQZS-Qij3v0koF/view

If you're interested in re-hosting entire multi-node systems, or re-hosting firmware needing complex interactions with the outside world, you might want to try the original, found here: https://github.com/embedded-sec/halucinator

Cite us!

Using hal-fuzz for research? Please cite our USENIX paper. More details at: http://subwire.net/publication/halucinator/

What is this crazy thing?

hal-fuzz is a generic emulator based on the principle of High Level Emulation (HLE), where we replace hardware-related library functions in the binary with high-level Python replacements. While these replacements are created manually, we show (and you can experience yourself) that these high-level replacements are shockingly easy to write. Most of them simply take arguments from the program, and do almost nothing of consequence. In fact, many do nothing at all, and are simply nop-outs.

In this fuzzing-oriented version, we replace the combo of full-system QEMU and Avatar used to create an instrumentable environment with AFL-Unicorn(https://github.com/Battelle/afl-unicorn ). This removes the significant bulk of the previous system, and adds AFL's fork-server and block coverage information. While we tried to engineer the system such that handlers written for one work with the other, we noticed that handlers used simply for fuzzing could be much shorter (and therefore much more performant) and opted to split up the system. This version also forgoes concepts such as the Peripheral Server from the original HALucinator for the same reasons.

In order to make hal-fuzz work at a reasonable speed, we made a number of notable optimizations:

More details can be found in the paper.

How do I use it?

Initial setup

Shortcut: Docker

Do you not hate Docker? Skip all the stuff below and try our Dockerfile (note that we don't suggest this for real-world fuzzing of anything, but it's a great way to play around!)

Simply:

docker build .

...and eventually...

docker run -it <image_hash> /bin/bash

and you'll be dropped into a shell with everything set up!

If you want to fuzz, make sure you do the following on the host, as root or AFL will get mad:

echo core >/proc/sys/kernel/core_pattern
cd /sys/devices/system/cpu
echo performance | tee cpu*/cpufreq/scaling_governor

The old-fashioned way

If you're on Ubuntu 18.04, first make yourself a Python virtual environment:

mkvirtualenv -p /usr/bin/python3 halfuzz

...and then run:

./setup.sh

You'll need to be a user with sudo permissions.
Cross your fingers.

Now what?

If all goes well, you can now use the ./hal-fuzz script to use the tool.

Other useful points of interest include the test_*.sh scripts found in the root directly. The "fuzz" scripts will start single-threaded AFL for that sample, the "parallel" scripts will start a huge parallel AFL ssession (be careful!) and the plain ones will simply run the binary once (useful for triage and debugging).

The csaw_tester_*.py scripts relate to our use of hal-fuzz in the 2019 CSAW Embedded Systems Challenge. These are for debugging, fuzzing, or validating challenge solutions. More info on what these do and why can be found on the CSAW ESC website.

Getting symbols

As we mention in the paper, you need a few things in order to use this tool, the first of which is the location of the libraries in the binary-under-test. There are two ways to get this information, which are provided to the system as a yml file:

Writing Handlers and Models

hal-fuzz doesn't do much without Handlers and Models. If your firmware uses a library we already have Handlers for, great! (see ./configs/hal/ for what's already supported) You can skip this part.

If not, you need to make some. This isn't hard (as we evaluate in the paper) and shouldn't take too long at all.

Handlers are the high-level replacement functions that make hardware-dependent behaviors disappear. Models are data abstractions to allow for handlers to share a common object, such as a virtual serial port or I2C bus. Models also facilitate a common interface with the outside host.

Your goal in writing handlers is to create two things: The python handler code itself, and a YML file mapping the actual symbol names to your handlers.

Handler code: A handler is a python function that takes the emulator's state (uc) as an argument, and transforms this state to appear as it would if the function was run. The three primary steps in most handlers are: 1) collecting arguments, 2) performing the actual behavior, and 3) returning a value. For example, consider a function that adds 2 to the argument and returns it; it would look like this:

def add_two(uc):
    number = uc.regs.r0 # Get the argument
    number += 2         # Add two
    uc.regs.r0 = number # Return the result

Now this isn't the kind of function you'd normally want to intercept. What about something with hardware in it? Let's say we have a function that takes n bytes from a serial port, and writes them into a buffer. We use the SerialModel for this. For example:

def serial_read(uc):
    serial_id = uc.regs.r0 # arg0, which serial port
    buff_ptr = uc.regs.r1 # arg1, where's the data going?
    len = uc.regs.r2 # arg2, how much data?
    buff_len = uc.regs.r3 # arg3, how long is that buffer?
    the_data = SerialModel.rx(serial_id, len) # get the actual data
    assert(buff_ptr != 0) # crash if we get a null pointer!
    assert(len <= buff_len) # crash if somebody's being bad!
    uc.mem[buff_ptr] = the_data # write it out to memory
    uc.regs.r0 = 0 # indicate success

In the above, we see a few new concepts. The function's arguments and their type should be looked up in the HAL, library, or SDK's documentation. Based on this info, we can help out our fuzzer by adding some preconditions that, if violated, tell us something has gone horribly wrong.

See the numerous included examples (in hal_fuzz.handlers) for ideas and inspiration on writing your own handlers.

YAML file: You can create a YAML file for the HAL or library you're handling so that it can be quickly re-used for any new firmware image. You may map multiple functions to the same handler, reuse other handlers, or leave the handler function name blank to just nop out the function. Note that you should use the nop-out when you can, we dynamically re-write the binary to add nop-outs to avoid calling Python code for performance reasons!

Following on from the above examples, we might do something like:

handlers:
    HAL_Serial_Read:
        handler: hal_fuzz.handlers.my_hal.serial_read
    HAL_Serial_Init:
        handler: 
    HAL_add_two:
        handler: hal_fuzz.handlers.my_hal.add_two

Once you have this YAML file for your HAL or library, just include it in your firmware's configuration (see below)

Configuring hal-fuzz

We use a YAML configuration file per binary to set up emulation. In this file, you need to specify the memory map, which libraries are in use, and any ancillary options that affect emulation, such as the configuration of peripherals you'd like.

Numerous examples exist in ./tests of how to do this, but the basic layout for a typical firmware sample is:

Command-line options

With the emulation environment configured, now it's time to run the tool. hal-fuzz accepts many command-line options (queried with ./hal-fuzz --help) which are described in detail here.

Using hal-fuzz with AFL

Ready to fuzz some drones? Great! If you've followed the steps above, hal-fuzz is already ready to use with AFL. See our examples (e.g., ./test_st_plc_parallel.sh) for examples. The basic usage is:

./afl-fuzz -U -m none -i ./path/to/inputs -o ./path/to/outputs -- ./hal-fuzz -c ./path/to/firmware.yml @@

Note that the first time you start, AFL will probably warn you about your system settings. Just follow its instructions and you'll be all set.

If you're using Docker, do these steps to the host, not to the container!

Debugging with hal-fuzz

Found a crash? Great! You can track it down with the provided tools.

As a first step, the -d, -t, and -M options above might be all you need -- they'll quickly tell you where in the binary the crash occured, and a quick trace of how the culprit data got there.

If you need anything beyond that, such as single-stepping through the binary, you can do this via the new SparklyUnicorn(tm) debugger. Just pass -b followed by an address, and you'll be dropped into an ipdb shell, with the binary halted. (NOTE: at this time, Unicorn only lets us break on the first instruction in a basic block! This is the price we pay for performance)

Once your breakpoint is hit, you're left with the Unicorn object itself (uc), which has been instrumented with many new features (it's sparkly!).

TODOs / Limitations / Help Wanted / Roadmap

hal-fuzz is a research prototype. While we feel it's extremely useful as-is, there are many rough edges to be worked on, which we look forward to addressing.

Here's an (incomplete) list: