Home

Awesome

Test case minimizer

This is an implementation of a variation of the ddmin algorithm originally designed to generate 1-minimal failing test cases.

It is described in: Simplifying Failure-Inducing Input, Ralf Hildebrandt and Andreas Zeller, 2000.

The program itself is designed to be used analogous to afl-tmin, with a few extra features and tunables available that are specific to the algorithm or potentially useful in general.

Requirements

To run afl-ddmin-mod:

To actually use the tool:

Usage

$ ./afl-ddmin-mod.py --help
usage: afl-ddmin-mod.py [ options ] -- /path/to/target_app [ ... ]

Required parameters:
  -i file               input test case to be shrunk by the tool
  -o file               final output location for the minimized data

Execution control settings:
  -t msec               timeout for each run (none)
  -m megs               memory limit for child process (50 MB)
  -Q                    use binary-only instrumentation (QEMU mode)

Minimization settings:
  -e                    solve for edge coverage only, ignore hit counts
  -d int, --max-depth int
                        limit the maximum recursion depth (none)
  -j int, --jitter int  test splitting at additional offsets (0)
  -r, --restart-recursion
                        restart the recursion after finding a smaller input
                        file
  --threads int         number of worker threads [0 = number of cores] (0)

Optional arguments and parameters:
  -a dir, --all-tests-dir dir
                        output directory for additional test cases that were
                        discovered while minimizing
  -c dir, --crash-dir dir
                        output directory for crashes that occurred while
                        minimizing
  -h, --help            show this help message and exit
  -V, --version         show program's version number and exit

For additional tips, please consult the README.

The main goal is to produce a 1-minimal test case, meaning removing any single byte of the test case would alter its behaviour according to afl-showmap. Please note that this does NOT mean that the resulting test case is a global minimum (the smallest test case that can be generated by changing the input file), which is a much harder problem and might require domain knowledge.

-i, -o, -t, -m, -Q, -e: These settings are directly passed to afl-showmap, please consult the (excellent) documentation there.

-d int, --max-depth int: Ddmin will split files in smaller and smaller chunks until they are all 1 byte in size, then it will return the smallest test case it has found so far. Since this means that at the last stage there will be 2 runs of afl-showmap for every byte that's left, this can result in very long runtimes for large inputs. With this setting it is possible to limit ddmin-mod's recursion depth to whatever you like. If no chunks have been removed, the current chunk count is always 2 to the power of depth (depth 6 means splitting into 64 chunks for example). By default, this setting will be "none" (meaning there is no restriction on recursion depth).

-j int, --jitter int: In its default implementation, ddmin does strict binary search by splitting the input file in 2 equal chunks. The problem with this approach is that not all file formats operate on a single byte boundary. Imagine a text heavy file format, where words have varying length that don't always align to the cuts that ddmin makes. To overcome this, the input either needs to be tokenized (not implemented, maybe afl-analyze might come in handy in the future), so ddmin operates on something potentially larger than single bytes, or more variations need to be tested.

The "jitter" setting will split chunks in additional locations producing a list of several possible splits. E.g. [012345] gets traditionally split like this: [012] [345] With jitter 2 it will get split in the canonical location as well as with offests -1, +1, -2 and +2, resulting in the following splits: [012] [345], [01] [2345], [0123] [45], [0] [12345] and [01234] [5]. In case an offset is too large, no split will be applied (offset +10 on the [012345] file will just return [012345]). This is not a problem, since all smaller offsets are already considered anyways. Jitter will increase the runtime of the algorithm significantly (as 2 more splits per jitter level will be checked). To make sure the search tree doesn't completely explode, only the canonical split will be used to create the next level of chunks after each level of granularity has been checked.

-r, --restart-recursion: If a chunk is removed, ddmin will not start at level 0 and split the newly found smaller test case in two, but rather keep on working at its current granularity level. If this switch is enabled, as soon as a smaller input is found, recursion is reset and the file is split from the largest possible set of chunks that describe this file. E.g. if testing [01] [23] [45] [67] [89] reveals that [23] is not relevant, ddmin would continue with [01] [45] [67] [89]. With restart-recursion it will first squash these tuples down to [01] [456789] and then re-start to split these again (e.g. to [0] [1] [456] [789]). This also significantly increases the runtime of the algorithm (as each removal will lead to a reset and new chunks being checked).

--threads int: While ddmin is not really perfectly parallelizable (tests have to be run in a certain order and once a result has been found, the input changes), it is at least possible to have a not very wasteful way of running its tests next to each other. Ddmin-mod queues up all relevant tests, spins up one thread per core that the current machine reports (or more/fewer, depending on the setting here) and after each test checks if it was successful. If so, the queue will be emptied, all workers die after their current test is run and the analysis of the result that was found is done by the main algorithm again. This can lead to some race conditions if several threads find solutions at the same time. To stay 100% compatible to ddmin, there might be a way to re-run the canonical order of tests again, but the simpler route of just taking the smallest or earliest result was chosen for now. This might lead to a small variance in output and is turned on by default. If you need a fully deterministic way of reducing your test cases, make sure to set threads to 1.

-a dir, --all-tests-dir dir: Afl-tmin usually runs a program a few thousand times with different inputs. Often these inputs create a different test case, however it will be not stored or reported because the main focus is to have a smaller test case that is identical to the initial one. Afl-ddmin-mod caches the hash of all resulting maps of all test cases it has ever run in one session, each of these represent a different test file. Additionally, this data is deduplicated and once a smaller test file is found for any hash it has already seen, this smaller test gets recorded instead. The only difference between these "accidential" test cases and the "input" test case is that afl-ddmin-mod won't look actively for smaller versions of these tests. Supplying a directory name via the -a setting will create this directory and store all test cases (named after the SHA256 hash of the map they created) in there. It is highly recommended to run afl-cmin or afl-pcmin on these test cases, as it is very likely that some show very similar behaviour or are a subset in functionality of other tests. While it might be possible to also extend afl-ddmin-mod towards this approach (parsing map files instead of hashing them and only returning relevant test cases), afl-(p)cmin already works well enough for that purpose.

-c dir, --crash-dir dir: Similar to the description in -a, afl-ddmin-mod also stores information about crashes/timeouts that happen while testing. If a directory is supplied here, this setting will write all crashes and timeouts (defined as whenever afl-showmap returned with a code larger than 0) to this directory. Again, it is recommended to use a corpus minimization tool afterwards.

Tips

Since this algorithm at least in my tests works a little bit better (creating smaller files) than afl-tmin, but also takes more time (if you play with -j and -r a LOT more time), multithreading and the ability to dump crashes and newly discovered tests was added, make sure to use it! This allows for an interesting "rinse and repeat" work cycle where you can break down several large initial test cases into more and more smaller ones by applying afl-ddmin-mod on a randomly selected test in a directory and then minimizing the resulting test cases including the remaining original ones. Afl-fuzz however prefers to have a small initial corpus as input, so keep this in mind before creating a few thousand different tiny sample files from a huge image library.

If you absolutely need to have a file that is as small and as simple as possible, running afl-tmin and afl-ddmin-mod on each other's output several times (maybe with wider jitter or -r) can lead to new, smaller results. In general the files after even a single canonical run are already quite small unless there are very unfavourable splits.

Afl-tmin does more modifications than just removing data, for example it also changes bytes around to reduce entropy. It is probably a good idea to also run it on your minimized test cases, either before or after afl-ddmin-mod to yield the benefits from doing that.

Caveats

Afl-ddmin-mod uses a global dictionary as cache - while convenient, this can lead to high memory usage if a lot of different maps are being generated by subsets of your input file. Also if your input file is huge or you run a large amount of tests, you might run into memory exhaustion errors eventually. While there are certain points in the algorithm (for example after increasing granularity) where the cache could be reset, you probably are trying an input file that's too large for AFL to handle anyways.

By default, all available cores on your machine are utilized. Make sure your hardware is cooled properly.

Currently error handling or safety of your files was not yet the main focus in writing this - folders just get created and files are written in there, if they already exist or not. While it is functionally complete, I would recommend to first try this out in a VM or container.

License

The program is under ISC license, which is a very permissive, MIT style free software license. Feel free to use it, however I personally would prefer if it is NOT used on critical infrastructure or in any military context.