Home

Awesome

msynth

Author: Tim Blazytko and Moritz Schloegel

msynth is a code deobfuscation framework to simplify Mixed Boolean-Arithmetic (MBA) expressions. Given a pre-computed simplification oracle, it walks over a complex expression represented as an abstract syntax tree (AST) and tries to simplify subtrees based on oracle lookups. Alternatively, it tries to simplify expressions via stochastic program synthesis.

msynth is built on top of Miasm and inspired by the papers

It can be used in combination with Miasm's symbolic execution engine to simplify complex expressions in obfuscated code or as a standalone tool to play around with MBA simplification.

original: {((((((((RSI[0:32] ^ 0xFFFFFFFF) & RDX[0:32]) + RSI[0:32]) ^ 0xFFFFFFFF) & RDX[0:32]) + ((RSI[0:32] ^ 0xFFFFFFFF) & RDX[0:32]) + RSI[0:32]) & (RDX[0:32] ^ 0xFFFFFFFF)) + -(((((((RSI[0:32] ^ 0xFFFFFFFF) & RDX[0:32]) + RSI[0:32]) ^ 0xFFFFFFFF) & RDX[0:32]) + ((RSI[0:32] ^ 0xFFFFFFFF) & RDX[0:32]) + RSI[0:32]) | (((RSI[0:32] ^ 0xFFFFFFFF) & RDX[0:32]) + RSI[0:32])) + ({RDI[0:32] & ({RDI[0:32] & RSI[0:32] 0 32, 0x0 32 64} * 0x2 + {RDI[0:32] ^ RSI[0:32] 0 32, 0x0 32 64})[0:32] 0 32, 0x0 32 64} * 0x2 + {((((RSI[0:32] ^ 0xFFFFFFFF) & RDX[0:32]) + RSI[0:32]) & RSI[0:32]) + (((RDI + {(RDI[0:32] ^ 0xFFFFFFFF) | RDX[0:32] 0 32, 0x0 32 64} + 0x1)[0:32] ^ 0xFFFFFFFF) & RDX[0:32]) + (RDI[0:32] ^ ({RDI[0:32] & RSI[0:32] 0 32, 0x0 32 64} * 0x2 + {RDI[0:32] ^ RSI[0:32] 0 32, 0x0 32 64})[0:32]) + ((RDI[0:32] ^ 0xFFFFFFFF) | RDX[0:32]) + (RDI + RDX + 0x1)[0:32] 0 32, 0x0 32 64})[0:32]) * 0x2 0 32, 0x0 32 64}

simplified: {(-RDX[0:32] + ((RDI[0:32] + RDX[0:32] + RSI[0:32]) << 0x1)) * 0x2 0 32, 0x0 32 64}

Core Features

Installation

To install msynth follow these steps:

git clone https://github.com/mrphrazer/msynth.git
cd msynth

# optionally: use a virtual environment
python -m venv msynth-env
source msynth-env/bin/activate

# install miasm
git clone https://github.com/cea-sec/miasm.git
cd miasm
git checkout 90dc1671b59077ee27c3d44d9d536d6659eb3bbe
pip install -r requirements.txt
pip install .

# go back into msynth directory
cd ..

# install dependencies
pip install -r requirements.txt

# install msynth
pip install .

# unzip database
unzip -d database -q database/3_variables_constants_7_nodes.txt.zip

Pre-computed Simplification Lookup Tables

To generate an oracle, we need a simplification lookup table (or database) containing a large number of expressions. We used an enumerative search to pre-compute expressions with a bit size of 8, 16, 32 and 64 according to the following specifications:

The example database included in database contains all 1,293,020 combinations created by using three variables and the constants 0x0, 0x1 and 0x2 for up to 7 nodes (e.g., ((p0 + p1) * (p2 ^ 0x2)) or ((p0 - p2) << (p1 + p2))). Larger pre-computed databases can be found here (~31GB unzipped). Note that the code for pre-computing expressions is not part of this repository. We plan to release it at some point in the future.

Stochastic Program Synthesis

As an alternative to pre-computed lookup tables, msynth supports expression simplification via stochastic program synthesis. For a given complex arithmetic expression, msynth can learn a shorter expression that shares the same input-output behavior. For now, it is implemented as a stand-alone component. However, we plan to combine both simplification approaches in future.

Example Usage

First, let's generate a simplification oracle that uses a pre-computed simplification database as input and clusters the contained expressions into equivalence classes.

$ python scripts/gen_oracle.py database/3_variables_constants_7_nodes.txt oracle.pickle
msynth - INFO: Computing oracle for 30 variables and 50 samples. 
               Using library at 'database/3_variables_constants_7_nodes.txt'
msynth - INFO: Writing oracle to oracle.pickle
msynth - INFO: Done in 632.84 seconds

Depending on the size of the pre-computed simplification database, this may take a few minutes or hours, depending on your computer. Alternatively, you can use the pre-computed oracle.pickle.

Afterward, the serialized oracle can be used to simplify complex expressions:

from msynth import Simplifier

# initialize simplifier
simplifier = Simplifier(oracle_path)
# simplify expression
simplified = simplifier.simplify(expression)

Alternatively, we can simplify complex expressions via program synthesis and learn expressions with the same input-output behavior:

from msynth import Synthesizer

# initialize synthesizer
synthesizer = Synthesizer()
# simplify via program synthesis
simplified = synthesizer.simplify(expression)

It is also possible to combine expression simplification with Miasm's symbolic execution engine:

$ python scripts/symbolic_simplification.py samples/mba_challenge 0x1290 oracle.pickle
[snip]
before: {({RDI[0:32] & RSI[0:32] 0 32, 0x0 32 64} * 0x2 + {RDI[0:32] ^ RSI[0:32] 0 32, 0x0 32 64})[0:32] 0 32, 0x0 32 64}

msynth.simplifier - INFO: initial ast: {({RDI[0:32] & RSI[0:32] 0 32, 0x0 32 64} * 0x2 + {RDI[0:32] ^ RSI[0:32] 0 32, 0x0 32 64})[0:32] 0 32, 0x0 32 64}

msynth.simplifier - INFO: simplified subtree: ({RDI[0:32] & RSI[0:32] 0 32, 0x0 32 64} * 0x2 + {RDI[0:32] ^ RSI[0:32] 0 32, 0x0 32 64})[0:32] -> RDI[0:32] + RSI[0:32]

simplified: {RDI[0:32] + RSI[0:32] 0 32, 0x0 32 64}
[snip]

Further example usages can be found in the scripts directory.

Limitations and Future Work

Contact

For more information, contact Tim Blazytko (@mr_phrazer) or Moritz Schloegel (@m_u00d8).