Awesome
arrcmp
high performance alternative to glibc's memcmp. Also demonstrating concept of using TMP with intrinsics for optimised assembly code.
This approach was presented as a lightning talk at C++ On Sea 2022, titled: "C++ your friendly meta assembler - or how to beat memcmp" https://www.youtube.com/watch?v=v6ENImXcsPE
Installing dependencies (for ubuntu)
# Google test and Google Benchmark
sudo apt install libgtest-dev libbenchmark-dev
# Google Benchmark plot
git submodule update --init --recursive
# the benchmark executable links against asmlib for comparison.
# the object file for the gcc/clang/intel 64-bit ELF version of asmlib in included in /ext
# and will be automatically linked against.
Building
Best results with clang-14 or newer:
cmake -B build -S . -DCMAKE_BUILD_TYPE=RELEASE -DCMAKE_CXX_COMPILER=clang++-14 -DCMAKE_C_COMPILER=clang-14
cmake --build build
Tests and benchmarks will be run automatically
run bench/run_and_graph_results.sh
to produce CSV results and plot them with google_benchmark_plot/plot.py
.
For the impatient the results, are included as CSV and png in bench folder and below...
Sandybridge i7 CPU (SSE2) - glibc 2.35
Skylake i7 CPU (AVX2) - glibc 2.35
Changes with glibc 2.39
Some improvements were made in glibc, which now means asmlib is not faster anymore, but arrcmp still is:
Ivy Bridge i5 CPU (SSE2) - glibc 2.39 - clang
some weird artifacts about 60bytes for gcc