Home

Awesome

FastMM4-AVX

FastMM4-AVX (efficient synchronization and AVX1/AVX2/AVX512/ERMS/FSRM support for FastMM4)

Written by Maxim Masiutin maxim@masiutin.com

Version 1.0.7

This is a fork of the "Fast Memory Manager" (FastMM) v4.993 by Pierre le Riche (see below for the original FastMM4 description)

What was added to FastMM4-AVX in comparison to the original FastMM4:

Here are the comparison of the Original FastMM4 version 4.992, with default options compiled for Win64 by Delphi 10.2 Tokyo (Release with Optimization), and the current FastMM4-AVX branch ("AVX-br."). Under some multi-threading scenarios, the FastMM4-AVX branch is more than twice as fast compared to the Original FastMM4. The tests have been run on two different computers: one under Xeon E5-2543v2 with 2 CPU sockets, each has 6 physical cores (12 logical threads) - with only 5 physical core per socket enabled for the test application. Another test was done under an i7-7700K CPU.

Used the "Multi-threaded allocate, use and free" and "NexusDB" test cases from the FastCode Challenge Memory Manager test suite, modified to run under 64-bit.

                     Xeon E5-2543v2 2*CPU      i7-7700K CPU
                    (allocated 20 logical   (8 logical threads,
                     threads, 10 physical    4 physical cores),
                     cores, NUMA), AVX-1          AVX-2

                    Orig.  AVX-br.  Ratio   Orig.  AVX-br. Ratio
                    ------  -----  ------   -----  -----  ------
02-threads realloc   96552  59951  62.09%   65213  49471  75.86%
04-threads realloc   97998  39494  40.30%   64402  47714  74.09%
08-threads realloc   98325  33743  34.32%   64796  58754  90.68%
16-threads realloc  116273  45161  38.84%   70722  60293  85.25%
31-threads realloc  122528  53616  43.76%   70939  62962  88.76%
64-threads realloc  137661  54330  39.47%   73696  64824  87.96%
NexusDB 02 threads  122846  90380  73.72%   79479  66153  83.23%
NexusDB 04 threads  122131  53103  43.77%   69183  43001  62.16%
NexusDB 08 threads  124419  40914  32.88%   64977  33609  51.72%
NexusDB 12 threads  181239  55818  30.80%   83983  44658  53.18%
NexusDB 16 threads  135211  62044  43.61%   59917  32463  54.18%
NexusDB 31 threads  134815  48132  33.46%   54686  31184  57.02%
NexusDB 64 threads  187094  57672  30.25%   63089  41955  66.50%

The above tests have been run on 14-Jul-2017.

Here are some more test results (Compiled by Delphi 10.2 Update 3):

                     Xeon E5-2667v4 2*CPU       i9-7900X CPU
                    (allocated 32 logical   (20 logical threads,
                     threads, 16 physical    10 physical cores),
                     cores, NUMA), AVX-2          AVX-512

                    Orig.  AVX-br.  Ratio   Orig.  AVX-br. Ratio
                    ------  -----  ------   -----  -----  ------
02-threads realloc   80544  60025  74.52%   66100  55854  84.50%
04-threads realloc   80751  47743  59.12%   64772  40213  62.08%
08-threads realloc   82645  32691  39.56%   62246  27056  43.47%
12-threads realloc   89951  43270  48.10%   65456  25853  39.50%
16-threads realloc   95729  56571  59.10%   67513  27058  40.08%
31-threads realloc  109099  97290  89.18%   63180  28408  44.96%
64-threads realloc  118589 104230  87.89%   57974  28951  49.94%
NexusDB 01 thread   160100 121961  76.18%   93341  95807 102.64%
NexusDB 02 threads  115447  78339  67.86%   77034  70056  90.94%
NexusDB 04 threads  107851  49403  45.81%   73162  50039  68.39%
NexusDB 08 threads  111490  36675  32.90%   70672  42116  59.59%
NexusDB 12 threads  148148  46608  31.46%   92693  53900  58.15%
NexusDB 16 threads  111041  38461  34.64%   66549  37317  56.07%
NexusDB 31 threads  123496  44232  35.82%   62552  34150  54.60%
NexusDB 64 threads  179924  62414  34.69%   83914  42915  51.14%

The above tests (on Xeon E5-2667v4 and i9) have been done on 03-May-2018.

Here is the single-threading performance comparison in some selected scenarios between FastMM v5.03 dated May 12, 2021 and FastMM4-AVX v1.05 dated May 20, 2021. FastMM4-AVX is compiled with default optinos. This test is run on May 20, 2021, under Intel Core i7-1065G7 CPU, Ice Lake microarchitecture, base frequency: 1.3 GHz, max turbo frequencey: 3.90 GHz, 4 cores, 8 threads. Compiled under Delphi 10.3 Update 3, 64-bit target. Please note that these are the selected scenarios where FastMM4-AVX is faster then FastMM5. In other scenarios, especially in multi-threaded with heavy contention, FastMM5 is faster.

                                         FastMM5  AVX-br.   Ratio
                                          ------  ------   ------
ReallocMem Small (1-555b) benchmark         1425    1135   79.65%
ReallocMem Medium (1-4039b) benchmark       3834    3309   86.31%
Block downsize                             12079   10305   85.31%
Address space creep benchmark              13283   12571   94.64%
Address space creep (larger blocks)        16066   13879   86.39%
Single-threaded reallocate and use          4395    3960   90.10%
Single-threaded tiny reallocate and use     8766    7097   80.96%
Single-threaded allocate, use and free     13912   13248   95.23%

You can find the program, used to generate the benchmark data, at https://github.com/maximmasiutin/FastCodeBenchmark

You can find the program, used to generate the benchmark data, at https://github.com/maximmasiutin/FastCodeBenchmark

FastMM4-AVX is released under a dual license, and you may choose to use it under either the Mozilla Public License 2.0 (MPL 2.1, available from https://www.mozilla.org/en-US/MPL/2.0/) or the GNU Lesser General Public License Version 3, dated 29 June 2007 (LGPL 3, available from https://www.gnu.org/licenses/lgpl.html).

FastMM4-AVX is free software: you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

FastMM4-AVX is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details.

You should have received a copy of the GNU Lesser General Public License along with FastMM4-AVX (see license_lgpl.txt and license_gpl.txt) If not, see http://www.gnu.org/licenses/.

FastMM4-AVX Version History:

The original FastMM4 description follows:

FastMM4

Fast Memory Manager

Description: A fast replacement memory manager for Embarcadero Delphi applications that scales well under multi-threaded usage, is not prone to memory fragmentation, and supports shared memory without the use of external .DLL files.

Homepage: https://github.com/pleriche/FastMM4

Advantages:

Usage: Delphi: Place this unit as the very first unit under the "uses" section in your project's .dpr file. When sharing memory between an application and a DLL (e.g. when passing a long string or dynamic array to a DLL function), both the main application and the DLL must be compiled using this memory manager (with the required conditional defines set). There are some conditional defines (inside FastMM4Options.inc) that may be used to tweak the memory manager. To enable support for a user mode address space greater than 2GB you will have to use the EditBin* tool to set the LARGE_ADDRESS_AWARE flag in the EXE header. This informs Windows x64 or Windows 32-bit (with the /3GB option set) that the application supports an address space larger than 2GB (up to 4GB). In Delphi 6 and later you can also specify this flag through the compiler directive {$SetPEFlags $20} *The EditBin tool ships with the MS Visual C compiler. C++ Builder: Refer to the instructions inside FastMM4BCB.cpp.

FastMM4

Fast Memory Manager FastMM-Title.jpg with title only

Description:

A fast replacement memory manager for Embarcadero Delphi applications that scales well under multi-threaded usage, is not prone to memory fragmentation, and supports shared memory without the use of external .DLL files.

Homepage:

https://github.com/pleriche/FastMM4

Advantages:

Usage:

Delphi:

Place this unit as the very first unit under the "uses" section in your project's .dpr file. When sharing memory between an application and a DLL (e.g. when passing a long string or dynamic array to a DLL function), both the main application and the DLL must be compiled using this memory manager (with the required conditional defines set).

There are some conditional defines (inside FastMM4Options.inc) that may be used to tweak the memory manager. To enable support for a user mode address space greater than 2GB you will have to use the EditBin* tool to set the LARGE_ADDRESS_AWARE flag in the EXE header. This informs Windows x64 or Windows 32-bit (with the /3GB option set) that the application supports an address space larger than 2GB (up to 4GB). In Delphi 6 and later you can also specify this flag through the compiler directive {$SetPEFlags $20}

*The EditBin tool ships with the MS Visual C compiler.

C++ Builder:

Refer to the instructions inside FastMM4BCB.cpp.