Investigate and resolve performance issues in the OpenMP backend on large core count CPUs #1929

mmichel11 · 2024-11-01T15:57:23Z

Describe the Bug:
This reported performance issue initially stems from test timeouts encountered in shift_left_right.pass in #1928, but I suspect the issue may extend beyond this algorithm.

OpenMP performance in shift_left and shift_right algorithms significantly degrades on CPUs with large core counts especially for small-to-medium sized inputs where we end up having very small grain sizes per thread. I believe this potentially extends far beyond these two algorithms, so we may have more similar cases like this. The best option in my opinion would be to benchmark performance of the OpenMP backend across different CPUs followed by optimization efforts where required.

To Reproduce:

CMake command: cmake -DCMAKE_CXX_COMPILER=g++ -DCMAKE_CXX_STANDARD=20 -DONEDPL_BACKEND=omp -DONEDPL_DEVICE_TYPE=HOST -DCMAKE_CXX_FLAGS="-DTEST_LONG_RUN=1" -DCMAKE_BUILD_TYPE=Release
oneDPL version: ac39d7e - The version is less relevant as it impacts the stable OMP backend.
Compiler version: less relevant, but I used: Intel(R) oneAPI DPC++/C++ Compiler 2024.2.1
OS: less relevant but I used Ubuntu 22.04
CPU: Intel(R) Xeon(R) Platinum 8480+

Here are the results I saw with shift_left_right.pass with different thread counts. A commit prior to #1928 is required which works around this issue in the test itself.

OMP_NUM_THREADS	Runtime (s)
1	0.621
2	2.850
4	4.060
8	7.426
16	9.489
32	20.765
64	20.077
128	61.985
unset	87.79

Ultimately, we will likely need more rigorous benchmarks to understand the impact of this issue on different problem sizes and then go from there with optimization work.

Expected Behavior:
I would expect the OpenMP backend to better control the number of threads launched along with grain size based on the problem size, so we do not require intervention from the user to avoid the shown performance issues.

The text was updated successfully, but these errors were encountered:

mmichel11 added the bug label Nov 1, 2024

mmichel11 mentioned this issue Nov 1, 2024

[test] Limit the number of launched threads in shift_left_right.pass with the OMP backend to avoid timeouts #1928

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Investigate and resolve performance issues in the OpenMP backend on large core count CPUs #1929

Investigate and resolve performance issues in the OpenMP backend on large core count CPUs #1929

mmichel11 commented Nov 1, 2024 •

edited

Loading

Investigate and resolve performance issues in the OpenMP backend on large core count CPUs #1929

Investigate and resolve performance issues in the OpenMP backend on large core count CPUs #1929

Comments

mmichel11 commented Nov 1, 2024 • edited Loading

mmichel11 commented Nov 1, 2024 •

edited

Loading