Thread pool-based implementation of parallel standard library algorithms.
C++17 introduced parallel versions of many algorithms in the standard library. These parallel versions accept an Execution Policy as their first argument. Different policies allow the implementation to parallelize the algorithm in a different way, such as using threads, vectorization, or even GPU. Policies can be supplied by the compiler or by libraries like this one.
Unfortunately compiler support for the standard policies varies. PoolSTL is a supplement to fill in the support gaps, so we can use parallel algorithms now. It is not meant as a full implementation; only the basics are expected to be covered. Use this if:
- you only need the basics, including no nested parallel calls.
- must use a compiler lacking native support (see "Parallel algorithms and execution policies").
- you cannot link against TBB for whatever reason.
- the Parallel STL is too heavy.
Supports C++11 and higher, C++17 preferred. Tested in CI on GCC 7+, Clang/LLVM 5+, Apple Clang, MSVC.
Algorithms are added on an as-needed basis. If you need one open an issue or contribute a PR.
- all_of, any_of, none_of
- copy, copy_n
- count, count_if
- fill, fill_n
- find, find_if, find_if_not
- for_each, for_each_n
- sort, stable_sort
- transform
- reduce
- transform_reduce (C++17 only)
All in std::
namespace.
Note: All iterators must be random access.
PoolSTL provides these execution policies:
poolstl::par
: Substitute forstd::execution::par
.poolstl::par_pool
:par
but with your own thread pool.poolstl::seq
: Substitute forstd::execution::seq
. Simply calls the sequential (non-policy) overload.poolstl::par_if
, andpar_if_pool
: choose parallel or sequential at runtime. See below.
In short, use poolstl::par
to make your code parallel. Complete example:
#include <iostream>
#include <poolstl/poolstl.hpp>
int main() {
std::vector<int> v = {0, 1, 2, 3, 4, 5};
auto sum = std::reduce(poolstl::par, v.cbegin(), v.cend());
// ^^^^^^^^^^^^
// Add this to make your code parallel.
std::cout << "Sum=" << sum << std::endl;
return 0;
}
The thread pool used by poolstl::par
is managed internally by poolSTL. It is started on first use.
Use your own thread pool
with poolstl::par_pool
for full control over thread count, startup/shutdown, etc.:
task_thread_pool::task_thread_pool pool{4}; // 4 threads
std::reduce(poolstl::par_pool(pool), v.cbegin(), v.cbegin());
Sometimes the choice whether to parallelize or not should be made at runtime. For example, you may wish to support both small and large inputs. The small inputs may not amortize the cost of starting threads and are best done sequentially.
Use poolstl::par_if
to select between par
and seq
at runtime:
bool is_parallel = true;
std::reduce(poolstl::par_if(is_parallel), v.cbegin(), v.cbegin());
Copy a single-file amalgamated poolstl.hpp
from the latest release and into your project.
Note: Some compilers, including non-Apple Clang and GCC 8 and older, require the -lpthread
linker flag to use C++11 threads.
include(FetchContent)
FetchContent_Declare(
poolSTL
GIT_REPOSITORY https://github.com/alugowski/poolSTL
GIT_TAG main
GIT_SHALLOW TRUE
)
FetchContent_MakeAvailable(poolSTL)
target_link_libraries(YOUR_TARGET poolSTL::poolSTL)
Alternatively copy or checkout the repo into your project and:
add_subdirectory(poolSTL)
See benchmark/ to compare poolSTL against the standard sequential implementation, and (if available) the
native std::execution::par
implementation.
Results on an M1 Pro (6 power, 2 efficiency cores), with GCC 13:
-------------------------------------------------------------------------------------------------------
Benchmark Time CPU Iterations
-------------------------------------------------------------------------------------------------------
all_of()/real_time 19.9 ms 19.9 ms 35
all_of(poolstl::par)/real_time 3.47 ms 0.119 ms 198
all_of(std::execution::par)/real_time 3.45 ms 3.25 ms 213
find_if()/needle_percentile:5/real_time 0.988 ms 0.987 ms 712
find_if()/needle_percentile:50/real_time 9.87 ms 9.86 ms 71
find_if()/needle_percentile:100/real_time 19.7 ms 19.7 ms 36
find_if(poolstl::par)/needle_percentile:5/real_time 0.405 ms 0.050 ms 1730
find_if(poolstl::par)/needle_percentile:50/real_time 1.85 ms 0.096 ms 393
find_if(poolstl::par)/needle_percentile:100/real_time 3.64 ms 0.102 ms 193
find_if(std::execution::par)/needle_percentile:5/real_time 0.230 ms 0.220 ms 3103
find_if(std::execution::par)/needle_percentile:50/real_time 1.75 ms 1.60 ms 410
find_if(std::execution::par)/needle_percentile:100/real_time 3.51 ms 3.24 ms 204
for_each()/real_time 94.6 ms 94.6 ms 7
for_each(poolstl::par)/real_time 18.7 ms 0.044 ms 36
for_each(std::execution::par)/real_time 15.3 ms 12.9 ms 46
sort()/real_time 603 ms 602 ms 1
sort(poolstl::par)/real_time 146 ms 0.667 ms 5
sort(std::execution::par)/real_time 121 ms 95.1 ms 6
transform()/real_time 95.0 ms 94.9 ms 7
transform(poolstl::par)/real_time 17.4 ms 0.037 ms 38
transform(std::execution::par)/real_time 15.3 ms 13.2 ms 45
reduce()/real_time 15.2 ms 15.2 ms 46
reduce(poolstl::par)/real_time 4.06 ms 0.044 ms 169
reduce(std::execution::par)/real_time 3.38 ms 3.16 ms 214
USE AT YOUR OWN RISK!
Two-line hack for missing compiler support. A no-op on compilers with support.
If POOLSTL_STD_SUPPLEMENT
is defined then poolSTL will check for native compiler support.
If not found then poolSTL will alias its poolstl::par
as std::execution::par
:
#define POOLSTL_STD_SUPPLEMENT
#include <poolstl/poolstl.hpp>
Now just use std::execution::par
as normal, and poolSTL will fill in as necessary. See supplement_test.cpp.
Example use case: You can link against TBB, so you'll use native support on GCC 9+, Clang, MSVC, etc. PoolSTL will fill in automatically on GCC <9 and Apple Clang.