Skip to content

Latest commit

 

History

History
174 lines (134 loc) · 9.22 KB

README.md

File metadata and controls

174 lines (134 loc) · 9.22 KB

tests codecov

poolSTL

Thread pool-based implementation of parallel standard library algorithms.

C++17 introduced parallel versions of many algorithms in the standard library. These parallel versions accept an Execution Policy as their first argument. Different policies allow the implementation to parallelize the algorithm in a different way, such as using threads, vectorization, or even GPU. Policies can be supplied by the compiler or by libraries like this one.

Unfortunately compiler support for the standard policies varies. PoolSTL is a supplement to fill in the support gaps, so we can use parallel algorithms now. It is not meant as a full implementation; only the basics are expected to be covered. Use this if:

  • you only need the basics, including no nested parallel calls.
  • must use a compiler lacking native support (see "Parallel algorithms and execution policies").
  • you cannot link against TBB for whatever reason.
  • the Parallel STL is too heavy.

Supports C++11 and higher, C++17 preferred. Tested in CI on GCC 7+, Clang/LLVM 5+, Apple Clang, MSVC.

Implemented Algorithms

Algorithms are added on an as-needed basis. If you need one open an issue or contribute a PR.

<algorithm>

<numeric>

All in std:: namespace.

Note: All iterators must be random access.

Usage

PoolSTL provides these execution policies:

  • poolstl::par: Substitute for std::execution::par.
  • poolstl::par_pool: par but with your own thread pool.
  • poolstl::seq: Substitute for std::execution::seq. Simply calls the sequential (non-policy) overload.
  • poolstl::par_if, and par_if_pool: choose parallel or sequential at runtime. See below.

In short, use poolstl::par to make your code parallel. Complete example:

#include <iostream>
#include <poolstl/poolstl.hpp>

int main() {
    std::vector<int> v = {0, 1, 2, 3, 4, 5};
    auto sum = std::reduce(poolstl::par, v.cbegin(), v.cend());
    //                     ^^^^^^^^^^^^
    //                     Add this to make your code parallel.
    std::cout << "Sum=" << sum << std::endl;
    return 0;
}

Controlling Thread Pool Size with par_pool

The thread pool used by poolstl::par is managed internally by poolSTL. It is started on first use.

Use your own thread pool with poolstl::par_pool for full control over thread count, startup/shutdown, etc.:

task_thread_pool::task_thread_pool pool{4};  // 4 threads

std::reduce(poolstl::par_pool(pool), v.cbegin(), v.cbegin());

Choosing Parallel or Sequential at Runtime with par_if

Sometimes the choice whether to parallelize or not should be made at runtime. For example, you may wish to support both small and large inputs. The small inputs may not amortize the cost of starting threads and are best done sequentially.

Use poolstl::par_if to select between par and seq at runtime:

bool is_parallel = true;

std::reduce(poolstl::par_if(is_parallel), v.cbegin(), v.cbegin());

Installation

Single File

Copy a single-file amalgamated poolstl.hpp from the latest release and into your project.

Note: Some compilers, including non-Apple Clang and GCC 8 and older, require the -lpthread linker flag to use C++11 threads.

CMake

include(FetchContent)
FetchContent_Declare(
        poolSTL
        GIT_REPOSITORY https://github.com/alugowski/poolSTL
        GIT_TAG main
        GIT_SHALLOW TRUE
)
FetchContent_MakeAvailable(poolSTL)

target_link_libraries(YOUR_TARGET poolSTL::poolSTL)

Alternatively copy or checkout the repo into your project and:

add_subdirectory(poolSTL)

Benchmark

See benchmark/ to compare poolSTL against the standard sequential implementation, and (if available) the native std::execution::par implementation.

Results on an M1 Pro (6 power, 2 efficiency cores), with GCC 13:

-------------------------------------------------------------------------------------------------------
Benchmark                                                             Time             CPU   Iterations
-------------------------------------------------------------------------------------------------------
all_of()/real_time                                                 19.9 ms         19.9 ms           35
all_of(poolstl::par)/real_time                                     3.47 ms        0.119 ms          198
all_of(std::execution::par)/real_time                              3.45 ms         3.25 ms          213
find_if()/needle_percentile:5/real_time                           0.988 ms        0.987 ms          712
find_if()/needle_percentile:50/real_time                           9.87 ms         9.86 ms           71
find_if()/needle_percentile:100/real_time                          19.7 ms         19.7 ms           36
find_if(poolstl::par)/needle_percentile:5/real_time               0.405 ms        0.050 ms         1730
find_if(poolstl::par)/needle_percentile:50/real_time               1.85 ms        0.096 ms          393
find_if(poolstl::par)/needle_percentile:100/real_time              3.64 ms        0.102 ms          193
find_if(std::execution::par)/needle_percentile:5/real_time        0.230 ms        0.220 ms         3103
find_if(std::execution::par)/needle_percentile:50/real_time        1.75 ms         1.60 ms          410
find_if(std::execution::par)/needle_percentile:100/real_time       3.51 ms         3.24 ms          204
for_each()/real_time                                               94.6 ms         94.6 ms            7
for_each(poolstl::par)/real_time                                   18.7 ms        0.044 ms           36
for_each(std::execution::par)/real_time                            15.3 ms         12.9 ms           46
sort()/real_time                                                    603 ms          602 ms            1
sort(poolstl::par)/real_time                                        146 ms        0.667 ms            5
sort(std::execution::par)/real_time                                 121 ms         95.1 ms            6
transform()/real_time                                              95.0 ms         94.9 ms            7
transform(poolstl::par)/real_time                                  17.4 ms        0.037 ms           38
transform(std::execution::par)/real_time                           15.3 ms         13.2 ms           45
reduce()/real_time                                                 15.2 ms         15.2 ms           46
reduce(poolstl::par)/real_time                                     4.06 ms        0.044 ms          169
reduce(std::execution::par)/real_time                              3.38 ms         3.16 ms          214

poolSTL as std::execution::par

USE AT YOUR OWN RISK!

Two-line hack for missing compiler support. A no-op on compilers with support.

If POOLSTL_STD_SUPPLEMENT is defined then poolSTL will check for native compiler support. If not found then poolSTL will alias its poolstl::par as std::execution::par:

#define POOLSTL_STD_SUPPLEMENT
#include <poolstl/poolstl.hpp>

Now just use std::execution::par as normal, and poolSTL will fill in as necessary. See supplement_test.cpp.

Example use case: You can link against TBB, so you'll use native support on GCC 9+, Clang, MSVC, etc. PoolSTL will fill in automatically on GCC <9 and Apple Clang.