Awesome
BenchFX: a collection of benchmark programs for WasmFX
This repository contains a suite of curated benchmarks for performance testing the WasmFX implementation in wasmtime.
Using the benchmark harness
The benchmark harness is invoked by running harness.py
, its user-facing
configuration resides in config.py
.
The harness has 3 subcommands, determining its mode of operation.
The simplest way of running all the benchmarks is as follows:
./harness setup # required only once
./harness run
Note that both the toplevel harness and the individual subcommand each have --help
options:
./harness --help
./harness setup --help
./harness run --help
./harness compare-revs --help
All benchmarks are listed in config.py
. Each benchmark is part of a
benchmark suite, which is uniquely identified by a subfolder inside the benchfx
repository and contains logically related benchmarks (e.g., different
implementations of the same program to compare their performance).
The harness is designed such that all dependencies (including wasmtime) are
built from source at fixed commits or a specific version is downloaded.
In other words, invoking ./harness.py
with the same options at the same
revision of the benchfx
repository (and therefore the same definitions in
config.py
), should uniquely determine the exact versions of all dependencies as well as the
flags they are built and run with.
setup
subcommand
The benchmark harness uses and controls the following repositories as subfolders
in tools/external
:
mimalloc
binaryen
spec
, containing the wasm reference interpreter- Two versions of wasmtime, called
wasmtime1
andwasmtime2
Before each benchmark run, the harness checks out and builds the requested
revision of each tool (determined either in config.py
or overridden with a
command line flag) before running benchmarks.
When running ./harness.py setup
, the harness checks that these repositories
exist and if not, clones them from Github appropriately.
The harness also uses the WASI SDK, running ./harness.py setup
checks that the
version configured in config.py
exists in the tools
subdirectory and
downloads it otherwise.
In addition, the harness requires a few standard tools (hyperfine
, cmake
,
make
, dune
, ... ) and will report and error if these are not found in
$PATH
. These must be installed manually by the user.
Using your own development repository of wasmtime
Since the benchmarks will be executed by checking out and building a specific commit of wasmtime, it can be handy not to use a Github repository, but to use commits only available in a local development repository.
The setup
subcommand therefore allows that instead of checking out wasmtime
from Github, it creates two git worktrees
inside tools/external/
, which
are connected to your development repository elsewhere.
This can be achieved as follows:
./harness.py setup \
--wasmtime-create-worktree-from-development-repo ~/path/to/my-wasmtime-devel-repository
This effectively means that tools/external/wasmtime1
and tools/external/wasmtime2
are not independent git repositories, but share the .git
folder with the
development repository and can see all commits therein.
run
subcommand
This is the main way to perform actual benchmarking.
In this mode, for each benchmark suite defined in config.py
, the benchmarks of
that suite are compared against each other.
The subcommand has multiple options to override for example how wasmtime is
built and run, see ./harness.py run --help
for a full list of options.
Filtering benchmarks
The --filter
option can be used to run only a subset of the benchmarks.
Filters are ordinary glob patterns that are matched against a pseudo-path identifying each
benchmark, of the form <suite's path>/<benchmark name>
. For example
the suite with path c10m
contains a benchmark called c10m_wasmfx
. It is
selected if the glob filter matches the pseudo-path c10m/c10m_wasmfx
.
The --filter
option can be used multiple times, and the harness will run a
benchmark if it matches any of the filters.
compare-revs
subcommand
This subcommand can be used to compare two revisions of wasmtime, rev1 and
rev2 against each other. Unlike the run
subcommand, benchmarks are not
compare against the others in the same suite. Instead, for each suite s and
each benchmark b in s, we compare b executed by wasmtime rev1 against
b executed wasmtime rev2.
The two revisions are provided to the subcommand directly as positional arguments:
./harness.py compare-revs main my-feature-branch
Filtering works the same as for the run
subcommand.
Most other options available for the run
subcommand are also available, but
are prefixed with rev1-
and rev2-
now.
As a result, compare-revs
can actually compare different configurations
rather than just revision: By using the same argument for the revision to use,
but varying the other options, we can determine their influence.
./harness.py compare-revs --filter="*/*wasmfx*" \
--rev1-wasmtime-run-args="-W=exceptions,function-references,typed-continuations -Wwasmfx-stack-size=4096 -Wwasmfx-red-zone-size=0" \
--rev2-wasmtime-run-args="-W=exceptions,function-references,typed-continuations -Wwasmfx-stack-size=8192 -Wwasmfx-red-zone-size=0" \
my-branch my-branch
Examples
Within each suite, only compare the wasmfx
implementations against asyncify
:
./harness.py run --filter="*/*wasmfx*" --filter="*/*asyncify*"
Running benchmarks using a particular wasmtime commit:
./harness.py run my-special-feature-branch
Enable verbose mode of harness itself (note -v
appearing before subcommand name), disable mimalloc
use:
./harness.py -v run --use-mimalloc=n
Gotchas
- The default fiber stack size may be too small to run WASI programs. It is recommended to run with at least 1mb sized stacks.
- Fiber stacks are unpooled at the moment, which unfavorably skews the benchmark results.
- The current implementation does not reference count Wasm
cont
objects, so to avoid generating garbage benchmarks should be run with the compile time featureunsafe_disable_continuation_linearity_check
which causes Wasmcont
objects point directly to the underlying fiber object. This is unsafe in general as a continuation are supposed to provide a typed view of its underlying fiber stack at a particular point of suspension. This type may change each suspension, whereas the type of the fiber does not.