Awesome

Obfuscation Detection (v2.1)

Author: Tim Blazytko

Automatically detect obfuscated code and other interesting code constructs

Description:

Obfuscation Detection is a Binary Ninja plugin to detect obfuscated code and interesting code constructs (e.g., state machines) in binaries. Given a binary, the plugin eases analysis by identifying code locations which might be worth a closer look during reverse engineering.

Based on various heuristics, the plugin pinpoints functions that contain complex or uncommon code constructs. Such code constructs may implement

obfuscated code
state machines and protocols
C&C server communication
string decryption routines
cryptographic algorithms

The following blog posts provide more information about the underlying heuristics and demonstrate their use cases:

Some example use cases can be found in examples. Furthermore, the REcon talk "Unveiling Secrets in Binaries using Code Detection Strategies" demonstrates some use cases. The slides can be found here; the recording can be found here.

Core Features

identifies interesting code constructs in large binaries
highlights disaligned instructions in Binary Ninja's graph view
efficient and architecture-agnostic implementation
runs as a background task
can be used in UI and headless mode

Installation

The tool can be installed using Binary Ninja's plugin manager.

For the headless version, follow these steps:

git clone https://github.com/mrphrazer/obfuscation_detection.git
cd obfuscation_detection

# install obfuscation_detection
pip install .

Usage

The plugin can be used in the user interface and in headless mode.

User Interface

Choose the index tab Plugins -> Obfuscation Detection to run one or more detection heuristics in Binary Ninja's user interface:

The results are displayed in the Log window:

By clicking on the identified function addresses, Binary Ninja navigates to the selected function.

Headless

To use the plugin in headless mode, run scripts/detect_obfuscation.py:

$ python3 scripts/detect_obfuscation.py <binary>

Detection Heuristics

The plugin implements various detection heuristics to detect different code constructs. In the following, we briefly describe the individual heuristics and explain their usage.

Large Basic Blocks

The large basic block heuristic identifies the top 10% of functions with the largest average number of instructions per basic block. It allows to detect

unrolled code
cryptographic implementations
initialization routines
arithmetic obfuscation / Mixed Boolean-Arithmetic

Complex Functions

To complex functions heuristic identifies the top 10% of functions with the most complex control-flow graphs (based on cyclomatic complexity). It allows to identify

complex dispatching routines and protocols
state machines
functions obfuscated with opaque predicates

Flattened Functions

The flattened function heuristic uses some graph-theoretic properties to identify functions implementing state machines. Usually, such state machines can be represented as switch statements that are dispatched in a loop. The heuristic allows to identify

network protocol dispatching
file parsing logic
C&C server communication / command dispatching
control-flow flattening

Uncommon Instruction Sequences

The uncommon instruction sequences heuristic performs a statistical analysis to identify the top 10% of functions whose code patterns deviate from a pre-computed ground truth. This way, the heuristic allows to identify

cryptographic implementations
intense usage of floating point arithmetic
arithmetic obfuscation / Mixed Boolean-Arithmetic
generic obfuscation patterns

Instruction Overlapping

The instruction overlapping heuristic identifies functions with disaligned instructions (instruction bytes are shared by two different instructions). The heuristic identifies

broken disassembly (e.g., data which is marked as code)
opaque predicates which jump into other instructions

If the heuristic is used in Binary Ninja's user interface, overlapping instructions are also highlighted in the graph view.

Most Called Functions

The heuristic for most called functions identifies the top 10% of functions with the largest number of calls from different functions. This way, the heuristic can identify

string decryption routines
library functions in statically linked binaries

High Loop Frequency

The heuristic identifies functions with a high number of loops. These kind of functions might implement

a complex parsing logic for structured data formats
intensive algorithmic computations

The heuristic also helps pinpointing potential performance bottlenecks.

Irreducible Loops

The heuristic identifies functions with rare and complex loop structures that typically suggest

aggressive compiler optimizations
hand-written assembly
usage of goto statements
obfuscated code

XOR Decryption Loops

The heuristic identifies functions which perform an XOR operation with a constant inside of a loop. This way, the heuristic can identify

string decryption routines
code decryption stubs
cryptographic implementations

Complex Arithmetic Expressions

The heuristic identifies functions in which the expressions have more than one arithmetic operation and one boolean operation simultaneously. This way, the heuristic can identify

mixed-boolean arithmetic obfuscation
initialization routine
cryptographic implementations

Utils

Contrary to the detection heuristics which target a wider scope of code, the plugin also implements various helpers which aim to identify functions with a narrower scope. In the following, we describe these helpers and explain their characteristics.

Entry Functions

This helper identifies functions without known callers. These functions might be

entry points in the binary
indirect jumps targets where the call hierarchy could not be recovered by the disassembler

Leaf Functions

This helper identifies functions that do not call other functions. These kinds of functions may, for example, be functions that

are outlined by the compiler to implement functionalities utilized across various code locations
are trampolines to other functions
are part of code obfuscation schemes (e.g., outlined computations for control-flow obfuscation)

Section Entropy

This helper evaluates the entropy of each section. Entropy is a statistical measure of randomness with values ranging between 0 and 8. Sections with an entropy close to 8 indicate a high degree of randomness and can hint at:

packed or compressed code
encrypted data or code
random data

RC4

This helper detects potential RC4 algorithm implementations by employing heuristic markers typically associated with RC4's Key Scheduling Algorithm (KSA) and Pseudo-Random Generation Algorithm (PRGA). RC4 is widely used in malware for purposes such as:

decrypting strings and other data payloads
obfuscating command and control (C&C) communications

Contact

For more information, contact @mr_phrazer.