Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add OSU Latency microbenchmarks for send/recv and isend/irecv #101

Merged
merged 5 commits into from
Jul 3, 2024

Conversation

nicoleavans
Copy link
Collaborator

@nicoleavans nicoleavans commented Jun 26, 2024

Output

To obtain results:

  • Modify ./perf_tests/CMakeLists.txt line 19, replace OFF with ON
  • Modify ./CMakeLists.txt lines 7-8, replace OFF with ON

Console Output:

2024-06-26T13:06:06-06:00
Running ./build/perf_tests/perf_test-main
Run on (160 X 3616 MHz CPU s)
CPU Caches:
  L1 Data 32 KiB (x40)
  L1 Instruction 32 KiB (x40)
  L2 Unified 512 KiB (x20)
  L3 Unified 10240 KiB (x20)
Load Average: 0.15, 0.38, 0.24
***WARNING*** Library was built as DEBUG. Timings may be affected.
-----------------------------------------------------------------------------------------------------------------------
Benchmark                                                             Time             CPU   Iterations UserCounters...
-----------------------------------------------------------------------------------------------------------------------
benchmark_osu_latency_KokkosComm_sendrecv/1/manual_time            2.38 us         4.17 us       294776 bytes=2
benchmark_osu_latency_KokkosComm_sendrecv/2/manual_time            2.37 us         4.17 us       294496 bytes=4
benchmark_osu_latency_KokkosComm_sendrecv/4/manual_time            2.37 us         4.15 us       295130 bytes=8
benchmark_osu_latency_KokkosComm_sendrecv/8/manual_time            2.38 us         4.17 us       294851 bytes=16
benchmark_osu_latency_KokkosComm_sendrecv/16/manual_time           2.37 us         4.17 us       294591 bytes=32
benchmark_osu_latency_KokkosComm_sendrecv/32/manual_time           2.38 us         4.18 us       294473 bytes=64
benchmark_osu_latency_KokkosComm_sendrecv/64/manual_time           2.38 us         4.17 us       293700 bytes=128
benchmark_osu_latency_KokkosComm_sendrecv/128/manual_time          2.39 us         4.18 us       294274 bytes=256
benchmark_osu_latency_KokkosComm_sendrecv/256/manual_time          2.39 us         4.19 us       293049 bytes=512
benchmark_osu_latency_KokkosComm_sendrecv/512/manual_time          2.42 us         4.21 us       290991 bytes=1.024k
benchmark_osu_latency_KokkosComm_sendrecv/1000/manual_time         2.44 us         4.23 us       287303 bytes=2k
benchmark_osu_latency_MPI_sendrecv/1/manual_time                   2.18 us         3.74 us       320733 bytes=2
benchmark_osu_latency_MPI_sendrecv/2/manual_time                   2.18 us         3.74 us       320474 bytes=4
benchmark_osu_latency_MPI_sendrecv/4/manual_time                   2.18 us         3.73 us       320472 bytes=8
benchmark_osu_latency_MPI_sendrecv/8/manual_time                   2.18 us         3.73 us       320630 bytes=16
benchmark_osu_latency_MPI_sendrecv/16/manual_time                  2.19 us         3.74 us       320509 bytes=32
benchmark_osu_latency_MPI_sendrecv/32/manual_time                  2.18 us         3.74 us       320275 bytes=64
benchmark_osu_latency_MPI_sendrecv/64/manual_time                  2.18 us         3.74 us       320488 bytes=128
benchmark_osu_latency_MPI_sendrecv/128/manual_time                 2.18 us         3.74 us       320470 bytes=256
benchmark_osu_latency_MPI_sendrecv/256/manual_time                 2.20 us         3.75 us       319026 bytes=512
benchmark_osu_latency_MPI_sendrecv/512/manual_time                 2.22 us         3.79 us       315089 bytes=1.024k
benchmark_osu_latency_MPI_sendrecv/1000/manual_time                2.26 us         3.83 us       310336 bytes=2k
benchmark_osu_latency_KokkosComm_isendirecv/1/manual_time          3.09 us         5.15 us       226081 bytes=2
benchmark_osu_latency_KokkosComm_isendirecv/2/manual_time          3.09 us         5.14 us       226412 bytes=4
benchmark_osu_latency_KokkosComm_isendirecv/4/manual_time          3.09 us         5.14 us       226377 bytes=8
benchmark_osu_latency_KokkosComm_isendirecv/8/manual_time          3.04 us         5.10 us       229349 bytes=16
benchmark_osu_latency_KokkosComm_isendirecv/16/manual_time         3.10 us         5.15 us       227185 bytes=32
benchmark_osu_latency_KokkosComm_isendirecv/32/manual_time         3.12 us         5.17 us       224717 bytes=64
benchmark_osu_latency_KokkosComm_isendirecv/64/manual_time         3.11 us         5.15 us       224213 bytes=128
benchmark_osu_latency_KokkosComm_isendirecv/128/manual_time        3.12 us         5.17 us       224492 bytes=256
benchmark_osu_latency_KokkosComm_isendirecv/256/manual_time        3.12 us         5.19 us       224949 bytes=512
benchmark_osu_latency_KokkosComm_isendirecv/512/manual_time        3.30 us         5.36 us       214022 bytes=1.024k
benchmark_osu_latency_KokkosComm_isendirecv/1000/manual_time       3.31 us         5.36 us       209809 bytes=2k
benchmark_osu_latency_MPI_isendirecv/1/manual_time                 2.14 us         3.72 us       327786 bytes=2
benchmark_osu_latency_MPI_isendirecv/2/manual_time                 2.14 us         3.72 us       327495 bytes=4
benchmark_osu_latency_MPI_isendirecv/4/manual_time                 2.14 us         3.72 us       327313 bytes=8
benchmark_osu_latency_MPI_isendirecv/8/manual_time                 2.14 us         3.72 us       327285 bytes=16
benchmark_osu_latency_MPI_isendirecv/16/manual_time                2.14 us         3.72 us       327641 bytes=32
benchmark_osu_latency_MPI_isendirecv/32/manual_time                2.14 us         3.73 us       327220 bytes=64
benchmark_osu_latency_MPI_isendirecv/64/manual_time                2.14 us         3.72 us       327017 bytes=128
benchmark_osu_latency_MPI_isendirecv/128/manual_time               2.14 us         3.72 us       327276 bytes=256
benchmark_osu_latency_MPI_isendirecv/256/manual_time               2.15 us         3.74 us       325788 bytes=512
benchmark_osu_latency_MPI_isendirecv/512/manual_time               2.18 us         3.78 us       320372 bytes=1.024k
benchmark_osu_latency_MPI_isendirecv/1000/manual_time              2.22 us         3.81 us       315937 bytes=2k

@cwpearson
Copy link
Collaborator

Could you please paste a snippet of example output?

perf_tests/test_osu_latency_isendirecv.cpp Show resolved Hide resolved
perf_tests/test_osu_latency_isendirecv.cpp Outdated Show resolved Hide resolved
perf_tests/test_osu_latency_isendirecv.cpp Outdated Show resolved Hide resolved
Copy link
Collaborator

@cwpearson cwpearson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we have some latency problems to fix. Thanks Nicole!

@cwpearson cwpearson merged commit 5c14fd2 into kokkos:develop Jul 3, 2024
7 checks passed
@nicoleavans nicoleavans deleted the osu-latency branch July 9, 2024 19:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants