Home

Awesome

Domato 🍅

A DOM fuzzer

Written and maintained by Ivan Fratric, ifratric@google.com

Copyright 2017 Google Inc. All Rights Reserved.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.

Usage

To see the usage information run the following command:

python3 generator.py --help

To generate a single .html sample run:

python generator.py --file <output file>

To generate a single .html sample run using a template you wrote:

python generator.py --file <output file> --template <your custom template file>

To generate multiple samples with a single call run:

python generator.py --output_dir <output directory> --no_of_files <number of output files>

The generated samples will be placed in the specified directory and will be named as fuzz-<number>.html, e.g. fuzz-00001.html, fuzz-00002.html etc. Generating multiple samples is faster because the input grammar files need to be loaded and parsed only once.

Code organization

generator.py contains the main script. It uses grammar.py as a library and contains additional helper code for DOM fuzzing.

grammar.py contains the generation engine that is mostly application-agnostic and can thus be used in other (i.e. non-DOM) generation-based fuzzers. As it can be used as a library, its usage is described in a separate section below.

.txt files contain grammar definitions. There are 3 main files, html.txt, css.txt and js.txt which contain HTML, CSS and JavaScript grammars, respectively. These root grammar files may include content from other files.

Using the generation engine and writing grammars

To use the generation engine with a custom grammar, you can use the following python code:

from grammar import Grammar

my_grammar = Grammar()
my_grammar.parse_from_file('input_file.txt')
result_string = my_grammar.generate_symbol('symbol_name')

The following sections describe the syntax of the grammar files.

Basic syntax

Domato is based on an engine that, given a context-free grammar in a simple format specified below, generates samples from that grammar.

A grammar is described as a set of rules in the following basic format:

<symbol> = a mix of constants and <other_symbol>s

Each grammar rule contains a left side and the right side separated by the equal character. The left side contains a symbol, while the right side contains the details on how that symbol may be expanded. When expanding a symbol, all symbols on the right-hand side are expanded recursively while everything that is not a symbol is simply copied to the output. Note that a single rule can't span over multiple lines of the input file.

Consider the following simplified example of a part of the CSS grammar:

<cssrule> = <selector> { <declaration> }
<selector> = a
<selector> = b
<declaration> = width:100%

If we instruct the grammar engine to parse that grammar and generate 'cssrule', we may end up with either:

a { width:100% }

or

b { width:100% }

Note there are two rules for the 'selector' symbol. In such cases, when the generator is asked to generate a 'selector', it will select the rule to use at random. It is also possible to specify the probability of the rule using the 'p' attribute, for example:

<selector p=0.9> = a
<selector p=0.1> = b

In this case, the string 'a' would be output more often than 'b'.

There are other attributes that can be applied to symbols in addition to the probability. Those are listed in a separate section.

Consider another example for generating html samples:

<html> = <lt>html<gt><head><body><lt>/html<gt>
<head> = <lt>head<gt>...<lt>/head<gt>
<body> = <lt>body<gt>...<lt>/body<gt>

Note that since the '<' and '>' have a special meaning in the grammar syntax, so here we are using <lt> and <gt> instead. These symbols are built in and don't need to be defined by the user. A list of all built-in symbols is provided in a separate section.

Generating programming language code

To generate programming language code, a similar syntax can be used, but there are a couple of differences. Each line of the programming language grammar is going to correspond to the line of the output. Because of that, the grammar syntax is going to be more free-form to allow expressing constructs in various programming languages. Secondly, when a line is generated, in addition to outputting the line, one or more variables may be created and those variables may be reused when generating other lines. Again, let's take a look of the simplified example:

!varformat fuzzvar%05d
!lineguard try { <line> } catch(e) {}

!begin lines
<new element> = document.getElementById("<string min=97 max=122>");
<element>.doSomething();
!end lines

If we instruct the engine to generate 5 lines, we may end up with something like:

try { var00001 = document.getElementById("hw"); } catch(e) {}
try { var00001.doSomething(); } catch(e) {}
try { var00002 = document.getElementById("feezcqbndf"); } catch(e) {}
try { var00002.doSomething(); } catch(e) {}
try { var00001.doSomething(); } catch(e) {}

Note that

Comments

Everything after the first '#' character on the line is considered a comment, so for example:

#This is a comment
Preventing infinite recursions

The grammar syntax has a way of telling the fuzzer which rules are nonrecursive and can be safe to use even if the maximum level of recursion has been reached. This is done with the ‘nonrecursive’ attributes. An example is given below.

!max_recursion 10
<test root=true> = <foobar>
<foobar> = foo<foobar>
<foobar nonrecursive> = bar

Firstly, an optional ‘!max_recursion’ statement defines the maximum recursion depth level (50 by default). Notice that the second production rule for ‘foobar’ is marked as non-recursive. If ever the maximum recursion level is reached the generator will force using the non-recursive rule for ‘foobar’ symbol, thus preventing infinite recursion.

Including and importing other grammar files

In Domato, including and importing grammars are two different context.

Including is simpler. You can use:

!include other.txt

to include rules from other.txt into the currently parsed grammar.

Importing works a bit differently:

!import other.txt

tells the parser to create a new Grammar() object that can then be referenced from the current grammar by using the special <import> symbol, for example like this:

<cssrule> = <import from=css.txt symbol=rule>

You can think about importing and including in terms of namespaces: !include will put the included grammar into the single namespace, while !import will create a new namespace which can then be accessed using the <import> symbol and the namespace specified via the 'from' attribute.

Including Python code

Sometimes you might want to call custom Python code in your grammar. For example, let’s say you want to use the engine to generate a http response and you want the body length to match the 'Size' header. Since this is something not possible with normal grammar rules, you can include custom Python code to accomplish it like this:

!begin function savesize
  context['size'] = ret_val
!end function

!begin function createbody
  n = int(context['size'])
  ret_val = 'a' * n
!end function

<foo root> = <header><cr><lf><body>
<header> = Size: <int min=1 max=20 beforeoutput=savesize>
<body> = <call function=createbody>

The python functions are defined between ‘!begin function <function_name>’ and ‘!end function’ commands. The functions can be called in two ways: using ‘beforeoutput’ attribute and using <call> symbol.

By specifying the ‘beforeoutput’ attribute in some symbol, the corresponding function will be called when this symbol is expanded, just before the result of the expansion is output to the sample. The expansion result will be passed to the function in the ret_val variable. The function is then free to modify ret_val, store it for later use or perform any other operations.

When using a special <call> symbol, the function (specified in a ‘function’ attribute) will be called when the symbol is encountered during language generation. Any value stored by the function in ret_val will be considered the result of the expansion (ret_val gets included in the sample).

Your python code has access to the following variables:

Built-in symbols

The following symbols have a special meaning and should not be redefined by users:

Symbol attributes

The following attributes are supported:

Bug Showcase

Some of the bugs that have been found with Domato:

Disclaimer

This is not an official Google product.