Home

Awesome

mathbench

Build Status

mathbench is a suite of unit tests and benchmarks comparing the output and performance of a number of different Rust linear algebra libraries for common game and graphics development tasks.

mathbench is written by the author of glam and has been used to compare the performance of glam with other similar 3D math libraries targeting games and graphics development, including:

The benchmarks

All benchmarks are performed using Criterion.rs. Benchmarks are logically into the following categories:

Despite best attempts, take the results of micro benchmarks with a pinch of salt.

Operation benchmarks

Workload benchmarks

The benchmarks are currently focused on f32 types as that is all glam currently supports.

Crate differences

Different libraries have different features and different ways of achieving the same goal. For the purpose of trying to get a performance comparison sometimes mathbench compares similar functionality, but sometimes it's not exactly the same. Below is a list of differences between libraries that are notable for performance comparisons.

Matrices versus transforms

The euclid library does not support generic square matrix types like the other libraries tested. Rather it has 2D and 3D transform types which can transform 2D and 3D vector and point types. Each library has different types for supporting transforms but euclid is unique amongst the libraries tested in that is doesn't have generic square matrix types.

The Transform2D is stored as a 3x2 row major matrix that can be used to transform 2D vectors and points.

Similarly Transform3D is used for transforming 3D vectors and points. This is represented as a 4x4 matrix so it is more directly comparable to the other libraries however it doesn't support some operations like transpose.

There is no equivalent to a 2x2 matrix type in euclid.

Matrix inverse

Note that cgmath and nalgebra matrix inverse methods return an Option whereas glam and euclid do not. If a non-invertible matrix is inverted by glam or euclid the result will be invalid (it will contain NaNs).

Quaternions versus rotors

Most libraries provide quaternions for performing rotations except for ultraviolet which provides rotors.

Wide benchmarks

All benchmarks are gated as either "wide" or "scalar". This division allows us to more fairly compare these different styles of libraries.

"scalar" benchmarks operate on standard scalar f32 values, doing calculations on one piece of data at a time (or in the case of a "horizontal" SIMD library like glam, one Vec3/Vec4 at a time).

"wide" benchmarks operate in a "vertical" AoSoA (Array-of-Struct-of-Arrays) fashion, which is a programming model that allows the potential to more fully use the advantages of SIMD operations. However, it has the cost of making algorithm design harder, as scalar algorithms cannot be directly used by "wide" architectures. Because of this difference in algorithms, we also can't really directly compare the performance of "scalar" vs "wide" types because they don't quite do the same thing (wide types operate on multiple pieces of data at the same time).

The "wide" benchmarks still include glam, a scalar-only library, as a comparison. Even though the comparison is somewhat apples-to-oranges, in each of these cases, when running "wide" benchmark variants, glam is configured to do the exact same amount of final work, producing the same outputs that the "wide" versions would. The purpose is to give an idea of the possible throughput benefits of "wide" types compared to writing the same algorithms with a scalar type, at the cost of extra care being needed to write the algorithm.

To learn more about AoSoA architecture, see this blog post by the author of nalgebra which goes more in depth to how AoSoA works and its possible benefits. Also take a look at the "Examples" section of ultraviolet's README, which contains a discussion of how to port scalar algorithms to wide ones, with the examples of the Euler integration and ray-sphere intersection benchmarks from mathbench.

Note that the nalgebra_f32x4 and nalgebra_f32x8 benchmarks require a Rust

Additionally the f32x8 benchmarks will require the AVX2 instruction set, to enable that you will need to build with RUSTFLAGS='-C target-feature=+avx2.

Build settings

The default profile.bench settings are used, these are documented in the cargo reference.

Some math libraries are optimized to use specific instruction sets and may benefit building with settings different to the defaults. Typically a game team will need to decided on a minimum specification that they will target. Deciding on a minimum specifiction dictates the potential audience size for a project. This is an important decision for any game and it will be different for every project. mathbench doesn't want to make assumptions about what build settings any particular project may want to use which is why default settings are used.

I would encourage users who to use build settigs different to the defaults to run the benchmarks themselves and consider publishing their results.

Benchmark results

The following is a table of benchmarks produced by mathbench comparing glam performance to cgmath, nalgebra, euclid, vek, pathfinder_geometry, static-math and ultraviolet on f32 data.

These benchmarks were performed on an Intel i7-4710HQ CPU on Linux. They were compiled with the 1.56.1 (59eed8a2a 2021-11-01) Rust compiler. Lower (better) numbers are highlighted within a 2.5% range of the minimum for each row.

The versions of the libraries tested were:

See the full mathbench report for more detailed results.

Scalar benchmarks

Run with the command:

cargo bench --features scalar scalar
benchmarkglamcgmathnalgebraeuclidvekpathfinderstatic-mathultraviolet
euler 2d x1000016.23 us16.13 us9.954 us16.18 us16.2 us10.42 us9.97 us16.17 us
euler 3d x1000015.95 us32.11 us32.13 us32.13 us32.13 us16.27 us32.16 us32.11 us
matrix2 determinant2.0386 ns2.0999 ns2.1018 nsN/A2.0997 ns2.0987 ns2.0962 ns2.1080 ns
matrix2 inverse2.8226 ns8.4418 ns7.6303 nsN/AN/A3.3459 ns9.4636 ns5.8796 ns
matrix2 mul matrix22.6036 ns5.0007 ns4.8172 nsN/A9.3814 ns2.5516 ns4.7274 ns4.9428 ns
matrix2 mul vector2 x12.4904 ns2.6144 ns2.8714 nsN/A4.2139 ns2.0839 ns2.8873 ns2.6250 ns
matrix2 mul vector2 x100227.5271 ns243.3579 ns265.1698 nsN/A400.6940 ns219.7127 ns267.8780 ns243.9880 ns
matrix2 return self2.4235 ns2.8841 ns2.8756 nsN/A2.8754 ns2.4147 ns2.8717 ns2.8697 ns
matrix2 transpose2.2887 ns3.0645 ns7.9154 nsN/A2.9635 nsN/A3.0637 ns3.0652 ns
matrix3 determinant3.9129 ns3.8107 ns3.8191 nsN/A3.8180 nsN/A3.8151 ns8.9368 ns
matrix3 inverse17.5373 ns18.6931 ns12.3183 nsN/AN/AN/A12.8195 ns21.9098 ns
matrix3 mul matrix39.9578 ns13.3648 ns7.8154 nsN/A35.5802 nsN/A6.4938 ns10.0527 ns
matrix3 mul vector3 x14.8090 ns4.9339 ns4.5046 nsN/A12.5518 nsN/A4.8002 ns4.8118 ns
matrix3 mul vector3 x1000.4836 us0.4808 us0.4755 usN/A1.247 usN/A0.4816 us0.4755 us
matrix3 return self5.4421 ns5.4469 ns5.4526 nsN/A5.4656 nsN/A5.4718 ns5.4043 ns
matrix3 transpose9.9567 ns10.0794 ns10.9704 nsN/A9.9257 nsN/A10.7350 ns10.5334 ns
matrix4 determinant6.2050 ns11.1041 ns69.2549 ns17.1809 ns18.5233 nsN/A16.5331 ns8.2704 ns
matrix4 inverse16.4386 ns47.0674 ns71.8174 ns64.1356 ns284.3703 nsN/A52.6993 ns41.1780 ns
matrix4 mul matrix47.7715 ns26.7308 ns8.6500 ns10.4414 ns86.1501 nsN/A21.7985 ns26.8056 ns
matrix4 mul vector4 x13.0303 ns7.7400 ns3.4091 nsN/A21.0968 nsN/A6.2971 ns6.2537 ns
matrix4 mul vector4 x1000.6136 us0.9676 us0.627 usN/A2.167 usN/A0.7893 us0.8013 us
matrix4 return self7.1741 ns6.8838 ns7.5030 nsN/A7.0410 nsN/A6.7768 ns6.9508 ns
matrix4 transpose6.6826 ns12.4966 ns15.3265 nsN/A12.6386 nsN/A15.2657 ns12.3396 ns
ray-sphere intersection x1000056.2 us55.7 us15.32 us55.45 us56.02 usN/AN/A50.94 us
rotation3 inverse2.3113 ns3.1752 ns3.3292 ns3.3311 ns3.1808 nsN/A8.7109 ns3.6535 ns
rotation3 mul rotation33.6584 ns7.5255 ns7.4808 ns8.1393 ns14.1636 nsN/A6.8044 ns7.6386 ns
rotation3 mul vector3 x16.4950 ns7.6808 ns7.5784 ns7.5746 ns18.2547 nsN/A7.2727 ns8.9732 ns
rotation3 mul vector3 x1000.6465 us0.7844 us0.7573 us0.7533 us1.769 usN/A0.7317 us0.9416 us
rotation3 return self2.4928 ns2.8740 ns2.8687 nsN/A2.8724 nsN/A4.7868 ns2.8722 ns
transform point2 x12.7854 ns2.8878 ns4.4207 ns2.8667 ns11.9427 ns2.3601 nsN/A4.1770 ns
transform point2 x1000.3316 us0.3574 us0.4445 us0.3008 us1.212 us0.3184 usN/A0.4332 us
transform point3 x12.9619 ns10.6812 ns6.1037 ns7.7051 ns13.2607 ns3.0934 nsN/A6.8419 ns
transform point3 x1000.6095 us1.27 us0.8064 us0.7674 us1.446 us0.6189 usN/A0.8899 us
transform vector2 x12.4944 nsN/A3.7174 ns2.6273 ns11.9424 nsN/AN/A3.0458 ns
transform vector2 x1000.3125 usN/A0.3871 us0.2817 us1.213 usN/AN/A0.3649 us
transform vector3 x12.8091 ns7.7343 ns5.5064 ns4.4810 ns15.4097 nsN/AN/A4.8819 ns
transform vector3 x1000.6035 us0.9439 us0.7573 us0.6327 us1.63 usN/AN/A0.6703 us
transform2 inverse9.0256 nsN/A12.2614 ns9.4803 nsN/A8.9047 nsN/AN/A
transform2 mul transform24.5111 nsN/A8.1434 ns5.8677 nsN/A3.8513 nsN/AN/A
transform2 return self4.1707 nsN/A5.4356 ns4.2775 nsN/A4.1117 nsN/AN/A
transform3 inverse10.9869 nsN/A71.4437 ns56.0136 nsN/A23.0392 nsN/AN/A
transform3 mul transform3d6.5903 nsN/A8.5673 ns10.1802 nsN/A7.6587 nsN/AN/A
transform3 return self7.1828 nsN/A7.2619 ns7.2407 nsN/A7.3214 nsN/AN/A
vector3 cross2.4257 ns3.6842 ns3.7945 ns3.6821 ns3.8323 nsN/A3.8622 ns3.6927 ns
vector3 dot2.1055 ns2.3179 ns2.3174 ns2.3190 ns2.3195 nsN/A2.3204 ns2.3160 ns
vector3 length2.5020 ns2.5002 ns2.5986 ns2.5013 ns2.5021 nsN/A2.5036 ns2.5017 ns
vector3 normalize4.0454 ns5.8411 ns8.4069 ns8.0679 ns8.8137 nsN/AN/A5.8440 ns
vector3 return self2.4087 ns3.1021 ns3.1061 nsN/A3.1052 nsN/A3.1136 ns3.1071 ns

Wide benchmarks

These benchmarks were performed on an Intel i7-4710HQ CPU on Linux. They were compiled with the 1.59.0-nightly (207c80f10 2021-11-30) Rust compiler. Lower (better) numbers are highlighted within a 2.5% range of the minimum for each row.

The versions of the libraries tested were:

Run with the command:

RUSTFLAGS='-C target-feature=+avx2' cargo +nightly bench --features wide wide
benchmarkglam_f32x1ultraviolet_f32x4nalgebra_f32x4ultraviolet_f32x8nalgebra_f32x8
euler 2d x80000142.7 us63.47 us63.94 us69.27 us69.25 us
euler 3d x80000141.2 us97.18 us95.78 us103.7 us105.7 us
matrix2 determinant x1618.6849 ns11.4259 nsN/A9.9982 nsN/A
matrix2 inverse x1639.1219 ns29.8933 nsN/A22.8757 nsN/A
matrix2 mul matrix2 x1642.7342 ns36.4879 nsN/A33.4814 nsN/A
matrix2 mul matrix2 x256959.1663 ns935.4148 nsN/A862.0910 nsN/A
matrix2 mul vector2 x1641.2464 ns18.2382 nsN/A17.2550 nsN/A
matrix2 mul vector2 x256698.1177 ns544.5315 nsN/A540.9743 nsN/A
matrix2 return self x1632.7553 ns29.5064 nsN/A21.4492 nsN/A
matrix2 transpose x1632.3247 ns46.4836 nsN/A20.0852 nsN/A
matrix3 determinant x1653.2366 ns25.0158 nsN/A22.1503 nsN/A
matrix3 inverse x16275.9330 ns78.3532 nsN/A69.2627 nsN/A
matrix3 mul matrix3 x16239.6124 ns115.2934 nsN/A116.6237 nsN/A
matrix3 mul matrix3 x2563.26 us1.959 usN/A1.963 usN/A
matrix3 mul vector3 x1678.4972 ns40.4734 nsN/A47.0164 nsN/A
matrix3 mul vector3 x2561.293 us1.0 usN/A1.007 usN/A
matrix3 return self x16112.4312 ns78.4870 nsN/A67.3272 nsN/A
matrix3 transpose x16116.9654 ns100.1097 nsN/A67.4544 nsN/A
matrix4 determinant x1698.8388 ns56.1177 nsN/A55.7623 nsN/A
matrix4 inverse x16276.2637 ns191.7471 nsN/A163.8408 nsN/A
matrix4 mul matrix4 x16230.9916 ns222.3948 nsN/A221.8563 nsN/A
matrix4 mul matrix4 x2563.793 us3.545 usN/A3.67 usN/A
matrix4 mul vector4 x1692.9485 ns87.7341 nsN/A90.4404 nsN/A
matrix4 mul vector4 x2561.58 us1.542 usN/A1.596 usN/A
matrix4 return self x16175.6153 ns158.7861 nsN/A167.6639 nsN/A
matrix4 transpose x16184.0498 ns193.5497 nsN/A147.1365 nsN/A
ray-sphere intersection x80000567.9 us154.8 usN/A61.49 usN/A
rotation3 inverse x1632.7517 ns32.8107 nsN/A22.3662 nsN/A
rotation3 mul rotation3 x1658.9408 ns38.6848 nsN/A34.3223 nsN/A
rotation3 mul vector3 x16130.6707 ns36.7861 nsN/A26.1154 nsN/A
rotation3 return self x1632.4345 ns32.5213 nsN/A21.8325 nsN/A
transform point2 x1652.6534 ns31.4527 nsN/A32.7317 nsN/A
transform point2 x256888.5654 ns831.9341 nsN/A848.0397 nsN/A
transform point3 x1696.9017 ns81.6828 nsN/A82.8904 nsN/A
transform point3 x2561.567 us1.398 usN/A1.43 usN/A
transform vector2 x1643.7679 ns29.9349 nsN/A31.8630 nsN/A
transform vector2 x256858.5660 ns825.0261 nsN/A851.7501 nsN/A
transform vector3 x1696.5535 ns80.1612 nsN/A85.0659 nsN/A
transform vector3 x2561.557 us1.394 usN/A1.438 usN/A
vector3 cross x1642.1941 ns26.6677 nsN/A22.0924 nsN/A
vector3 dot x1629.1805 ns12.7972 nsN/A12.2872 nsN/A
vector3 length x1632.6014 ns9.7692 nsN/A9.4271 nsN/A
vector3 normalize x1665.8815 ns24.1661 nsN/A20.3579 nsN/A
vector3 return self x1632.0051 ns42.9462 nsN/A16.7808 nsN/A

Running the benchmarks

The benchmarks use the criterion crate which works on stable Rust, they can be run with:

cargo bench

For the best results close other applications on the machine you are using to benchmark!

When running "wide" benchmarks, be sure you compile with with the appropriate target-features enabled, e.g. +avx2, for best results.

There is a script in scripts/summary.py to summarize the results in a nice fashion. It requires Python 3 and the prettytable Python module, then can be run to generate an ASCII output.

Default and optional features

All libraries except for glam are optional for running benchmarks. The default features include cgmath, ultraviolet and nalgebra. These can be disabled with:

cargo bench --no-default-features

To selectively enable a specific default feature again use:

cargo bench --no-default-features --features nalgebra

Note that you can filter which benchmarks to run at runtime by using Criterion's filtering feature. For example, to only run scalar benchmarks and not wide ones, use:

cargo bench "scalar"

You can also get more granular. For example to only run wide matrix2 benchmarks, use:

cargo bench --features wide "wide matrix2"

or to only run the scalar "vec3 length" benchmark for glam, use:

cargo bench "scalar vec3 length/glam"

Crate features

There are a few extra features in addition to the direct features referring to each benchmarked library.

unstable feature

The unstable feature requires a nightly compiler, and it allows us to tell rustc not to inline certain functions within hot benchmark loops. This is used in the ray-sphere intersection benchmark in order to simulate situations where the autovectorizer would not be able to properly vectorize your code.

Running the tests

The tests can be run using:

cargo test

Publishing results

When publishing benchmark results it is important to document the details of how the benchmarks were run, including:

Adding a new library

There are different steps involved for adding a unit tests and benchmarks for a new library.

Benchmarks require an implementation of the mathbench::RandomVec trait for the types you want to benchmark. If the type implements the rand crate distribution::Distribution trait for Standard then you can simply use the impl_random_vec! macro in src/lib.rs. Otherwise you can provide a function that generates a new random value of your type pass that to impl_random_vec!.

To add the new libary type to a benchmark, add another bench_function call to the Criterion BenchmarkGroup.

Increment the patch version number of mathbench in the Cargo.toml.

Update CHANGELOG.md.

Build times

mathbench also includes a tool for comparing full build times in tools/buildbench. Incremental build times are not measured as it would be non trivial to create a meaningful test across different math crates.

The buildbench tool uses the -Z timings feature of the nightly build of cargo, thus you need a nightly build to run it.

buildbench generates a Cargo.toml and empty src/lib.rs in a temporary directory for each library, recording some build time information which is included in the summary table below. The temporary directory is created every time the tool is run so this is a full build from a clean state.

Each library is only built once so you may wish to run buildbench multiple times to ensure results are consistent.

By default crates are built using the release profile with default features enabled. There are options for building the dev profile or without default features, see buildbench --help for more information.

The columns outputted include the total build time, the self build time which is the time it took to build the crate on it's own excluding dependencies, and the number of units which is the number of dependencies (this will be 2 at minimum).

When comparing build times keep in mind that each library has different feature sets and that naturally larger libraries will take longer to build. For many crates tested the dependencies take longer than the math crate. Also keep in mind if you are already building one of the dependencies in your project you won't pay the build cost twice (unless it's a different version).

crateversiontotal (s)self (s)units
cgmath0.17.06.83.017
euclid0.22.13.41.04
glam0.9.41.10.62
nalgebra0.22.024.218.024
pathfinder_geometry0.5.13.00.38
static-math0.1.66.91.710
ultraviolet0.5.12.51.34
vek0.12.034.410.116

These benchmarks were performed on an Intel i7-4710HQ CPU with 16GB RAM and a Toshiba MQ01ABD100 HDD (SATA 3Gbps 5400RPM) on Linux.

License

Licensed under either of

at your option.

Contribution

Contributions in any form (issues, pull requests, etc.) to this project must adhere to Rust's Code of Conduct.

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.

Support

If you are interested in contributing or have a request or suggestion create an issue on github.