Awesome
<h1 align=center> thread-pool </h1>A simple, fast and functional thread pool implementation using pure C++20.
Features
- Built entirely with C++20
- Enqueue tasks with or without tracking results
- High performance
Integration
dp::thread-pool
is a header only library. All the files needed are in include/thread_pool
.
vcpkg
dp::thread-pool
is available on vcpkg
vcpkg install dp-thread-pool
CMake
thread-pool
defines the CMake target dp::thread-pool
.
You can then use find_package()
:
find_package(dp::thread-pool REQUIRED)
Alternatively, you can use something like CPM which is based on CMake's Fetch_Content
module.
CPMAddPackage(
NAME thread-pool
GITHUB_REPOSITORY DeveloperPaul123/thread-pool
GIT_TAG 0.6.0 # change this to latest commit or release tag
OPTIONS
"TP_BUILD_TESTS OFF"
"TP_BUILD_BENCHMARKS OFF"
"TP_BUILD_EXAMPLES OFF"
)
Usage
Enqueue tasks without a returned result:
// create a thread pool with a specified number of threads.
dp::thread_pool pool(4);
// add tasks, in this case without caring about results of individual tasks
pool.enqueue_detach([](int value) { /*...your task...*/ }, 34);
pool.enqueue_detach([](int value) { /*...your task...*/ }, 37);
pool.enqueue_detach([](int value) { /*...your task...*/ }, 38);
// and so on..
Enqueue tasks with a returned value:
// create a thread pool with a specified number of threads.
dp::thread_pool pool(4);
auto result = pool.enqueue([](int value) -> int { /*...your task...*/ return value; }, 34);
// get the result, this will block the current thread until the task is complete
auto value = result.get();
Enqueue tasks and wait for them to complete:
dp::thread_pool pool(4);
// add tasks, in this case without caring about results of individual tasks
pool.enqueue_detach([](int value) { /*...your task...*/ }, 34);
pool.enqueue_detach([](int value) { /*...your task...*/ }, 37);
pool.enqueue_detach([](int value) { /*...your task...*/ }, 38);
pool.enqueue_detach([](int value) { /*...your task...*/ }, 40);
// wait for all tasks to complete
pool.wait_for_tasks();
You can see other examples in the /examples
folder.
Benchmarks
Benchmarks were run using the nanobench library. See the ./benchmark
folder for the benchmark code. The benchmarks are set up to compare matrix multiplication using the dp::thread_pool
versus other thread pool libraries. These include:
- ConorWilliams/Threadpool
- bshoshany/thread-pool (C++17)
- alugowski/task-thread-pool (C++11, no work stealing)
The benchmarks are set up so that each library is tested against dp::thread_pool
using std::function
as the baseline. Relative measurements (in %) are recorded to compare the performance of each library to the baseline.
Machine Specs
- AMD Ryzen 7 5800X (16 X 3800 MHz CPUs)
- 32 GB RAM
Results
Summary
In general, dp::thread_pool
is faster than other thread pool libraries in most cases. This is especially the case when std::move_only_function
is available. fu2::unique_function
is a close second, and std::function
is the sloweset when used in dp::thread_pool
. In certain situations, riften::ThreadPool
pulls ahead in performance. This is likely due to the fact that this library uses a lock-free queue. There is also a custom semaphore and it seems that there is a difference in how work stealing is handled as well. Interestingly, task_thread_pool
seems to pull ahead with large numbers of smaller tasks.
Details
Below is a portion of the benchmark data from the MSVC results:
relative | ms/op | op/s | err% | total | matrix multiplication 256x256 |
---|---|---|---|---|---|
100.0% | 93.27 | 10.72 | 0.7% | 16.69 | dp::thread_pool - std::function |
102.9% | 90.66 | 11.03 | 0.6% | 16.22 | dp::thread_pool - std::move_only_function |
98.7% | 94.50 | 10.58 | 0.2% | 16.91 | dp::thread_pool - fu2::unique_function |
93.5% | 99.73 | 10.03 | 0.4% | 17.86 | BS::thread_pool |
102.2% | 91.29 | 10.95 | 0.6% | 16.39 | task_thread_pool |
100.1% | 93.18 | 10.73 | 1.4% | 16.61 | riften::Thiefpool |
If you wish to look at the full results, use the links below.
Some notes on the benchmark methodology:
- Matrix sizes are all square (MxM).
- Each multiplication is
(MxM) * (MxM)
where*
refers to a matrix multiplication operation. - Benchmarks were run on Windows, so system stability is something to consider (dynamic CPU frequency scaling, etc.).
- Relative
Building
This project has been built with:
- Visual Studio 2022
- Clang
10.+
(via WSL on Windows) - GCC
11.+
(vis WSL on Windows) - CMake
3.19+
To build, run:
cmake -S . -B build
cmake --build build
Build Options
Option | Description | Default |
---|---|---|
TP_BUILD_TESTS | Turn on to build unit tests. Required for formatting build targets. | ON |
TP_BUILD_EXAMPLES | Turn on to build examples | ON |
Run clang-format
Use the following commands from the project's root directory to check and fix C++ and CMake source style.
This requires clang-format, cmake-format and pyyaml to be installed on the current system. To use this feature you must turn on TP_BUILD_TESTS
.
# view changes
cmake --build build/test --target format
# apply changes
cmake --build build/test --target fix-format
See Format.cmake for details.
Build the documentation
The documentation is automatically built and published whenever a GitHub Release is created. To manually build documentation, call the following command.
cmake -S documentation -B build/doc
cmake --build build/doc --target GenerateDocs
# view the docs
open build/doc/doxygen/html/index.html
To build the documentation locally, you will need Doxygen and Graphviz on your system.
Contributing
Contributions are very welcome. Please see contribution guidelines for more info.
License
The project is licensed under the MIT license. See LICENSE for more details.
Author
<img src="https://avatars0.githubusercontent.com/u/6591180?s=460&v=4" width="100"><br><sub>@DeveloperPaul123</sub> |
---|