Skip to content

Commit

Permalink
Stream results updated roci results.
Browse files Browse the repository at this point in the history
  • Loading branch information
dmageeLANL committed Sep 22, 2023
1 parent f4c6ca3 commit 4eef1dc
Show file tree
Hide file tree
Showing 9 changed files with 105 additions and 31 deletions.
27 changes: 25 additions & 2 deletions doc/sphinx/10_microbenchmarks/M1_STREAM/STREAM.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,23 +21,35 @@ Problem
There are four memory operations that the STREAM benchmark measures: Copy, Scale, Add, and Triad.

Copy - Copies data from one array to another:

.. math::
b[i] = a[i]

Scale - Multiplies each array element by a constant, a daxpy operation.

.. math::
b[i] = q*a[i]

Add - Adds two arrays element-wise:

.. math::
c[i] = a[i] + b[i]

Triad - Multiply-add operation:

.. math::
a[i] = b[i] + q*c[i]

These operations stress memory and floating point pipelines.They test memory transfer speed, computation speed, and different combinations of these two components of overall performance performance.

Figure of Merit
---------------

The primary FOM is the max Triad rate (MB/s).
The primary FOM is the MAX Triad rate (MB/s).

Run Rules
---------
Expand Down Expand Up @@ -118,6 +130,17 @@ Example Results
ATS-3 Rocinante HBM
-------------------

.. csv-table:: STREAM microbenchmark bandwidth measurement
:file: stream-xrds_ats5cce-cray-mpich.csv
:align: center
:widths: 10, 10, 10
:header-rows: 1

.. figure:: stream_cpu_ats3.png
:align: center
:scale: 50%
:alt: STREAM microbenchmark bandwidth measurement

CTS-1 Snow
-----------

Expand All @@ -127,7 +150,7 @@ CTS-1 Snow
:widths: 10, 10, 10
:header-rows: 1

.. figure:: cpu_cts1.png
.. figure:: stream_cpu_cts1.png
:align: center
:scale: 50%
:alt: STREAM microbenchmark bandwidth measurement
Expand Down
15 changes: 10 additions & 5 deletions doc/sphinx/10_microbenchmarks/M1_STREAM/cpu.gp
Original file line number Diff line number Diff line change
@@ -1,20 +1,21 @@
#!/usr/bin/gnuplot
#STREAM
set terminal pngcairo enhanced size 1024, 768 dashed font 'Helvetica,18'
set output "cpu_cts1.png"
set output "stream_cpu_cts1.png"

set title "STREAM Single node bandwidth" font "serif,22"
set ylabel "Per core triad (MB/s)"
set y2label "FOM: Total Triad (MB/s)"
set ylabel "Per core triad BW (MB/s)"
set y2label "FOM: Total triad BW (MB/s)"

set xrange [1:40]
set yrange [3000:15000]
#set yrange [3000:15000]

# set logscale x 2
# set logscale y 2
set logscale y 2

set grid
show grid
set key left top

set datafile separator comma
set key autotitle columnheader
Expand All @@ -24,5 +25,9 @@ set style line 2 linetype 1 dashtype 2 linecolor rgb "#FF0000" linewidth 2

plot "stream_cts1.csv" using 1:2 with linespoints linestyle 1 axis x1y1, "" using 1:3 with line linestyle 2 axis x1y2

set output "stream_cpu_ats3.png"
set xrange [4:115]
plot "stream-xrds_ats5cce-cray-mpich.csv" using 1:2 with linespoints linestyle 1 axis x1y1, "" using 1:3 with line linestyle 2 axis x1y2



This file was deleted.

Original file line number Diff line number Diff line change
@@ -0,0 +1,6 @@
No. Cores,Per Core Bandwidth (MB/s), Total Bandwidth (MB/s)
8,2.304e+04,1.843e+05
32,1.883e+04,6.025e+05
56,1.452e+04,8.131e+05
88,1.566e+04,1.378e+06
112,1.414e+04,1.584e+06
2 changes: 1 addition & 1 deletion doc/sphinx/10_microbenchmarks/M1_STREAM/stream_cts1.csv
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
No. Cores,Bandwidth (MB/s),Total Bandwidth (MB/s)
No. Cores,Per Core Bandwidth (MB/s),Total Bandwidth (MB/s)
1,10690.1,10690.1
2,10701.3,21402.6
4,9316.5,37266.0
Expand Down

Large diffs are not rendered by default.

1 change: 1 addition & 0 deletions utils/pav_config/result_tests/stream_roci_nid001038.json

Large diffs are not rendered by default.

74 changes: 60 additions & 14 deletions utils/pav_config/tests/stream.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ _base:
Speed Network backbone.
INSTRUCTIONS:
1) STREAM requires different amounts of memory to run on different
systems, depending on both the system cache size(s) and the
granularity of the system timer.
Expand All @@ -56,7 +56,7 @@ _base:
granularity. 20 "ticks" at 10 ms/tic is 200 milliseconds.
If the chip is capable of 10 GB/s, it moves 2 GB in 200 msec.
This means the each array must be at least 1 GB, or 128M elements.
Version 5.10 increases the default array size from 2 million
elements to 10 million elements in response to the increasing
size of L3 caches. The new default size is large enough for caches
Expand All @@ -65,7 +65,7 @@ _base:
to "ssize_t", which allows array indices >2^32 (4 billion)
on properly configured 64-bit systems. Additional compiler options
(such as "-mcmodel=medium") may be required for large memory runs.
Array size can be set at compile time without modifying the source
code for the (many) compilers that support preprocessor definitions
on the compile line. E.g.,
Expand All @@ -92,14 +92,14 @@ _base:
CACHE: 105
SOCKETS: 2
4 x (105M * 2 ) / 3 ARRAYS / 8 BYTES/ELEMENT = 35 Mi elements = 35000000
scheduler: slurm
schedule:
nodes: '10' # 'ALL'
tasks_per_node: 1
share_allocation: false

permute_on:
permute_on:
- compilers
- mpis
- omp_num_threads
Expand Down Expand Up @@ -344,6 +344,7 @@ spr_hbm_xrds:
- 'module load {{compilers.name}}/{{compilers.version}}'
- 'module load {{mpis.modulefile}}'


cts1_ats5:
inherits_from: cts1_xrds
subtitle: '{{compilers.name}}-{{compilers.version}}_{{tpn}}_{{mpis.name}}-{{mpis.version}}'
Expand All @@ -354,31 +355,64 @@ cts1_ats5:
- tpn

variables:
numnodes: '1'
tpn: [1, 2, 4, 8, 16, 32, 36]
numnodes: '1'
omp_num_threads: '1'

schedule:
nodes: "{{numnodes}}"
share_allocation: true
tasks_per_node: "{{tpn}}"

run:
env:
GOMP_CPU_AFFINITY: ''

result_parse:
regex:
triad_once:
regex: '^Triad: *([0-9\.]*) '
action: store
match_select: last
files: '*stream'

result_evaluate:
total_bandwidth: '{{tpn}}*triad_once'

xrds_ats5:
inherits_from: cts1_ats5
inherits_from: _base
subtitle: '{{compilers.name}}-{{compilers.version}}_{{tpn}}_{{mpis.name}}-{{mpis.version}}'

permute_on:
- compilers
- mpis
- tpn

only_if:
"{{sys_name}}": ['crossroads', 'rocinante']
"{{sys_os.name}}": [ cos ]

variables:
tpn: [8, 32, 56, 88, 112]
arch: "spr"
stream_array_size: 35e6
omp_places: [cores, sockets]
omp_proc_bind: [true]

target: "xrds-stream.exe"
stream_array_size: 35000000
#omp_places: [cores, sockets]
#omp_proc_bind: [true]
numnodes: '1'
omp_num_threads: '1'

chunk: '{{chunk_ids.0}}'

schedule:
partition: 'hbm'
partition: 'hbm'
nodes: "{{numnodes}}"
share_allocation: true
tasks_per_node: "{{tpn}}"
chunking:
size: 1

build:
on_nodes: false
preamble:
#- 'module load friendly-testing' #'module rm craype-hugepages2M'
- 'module swap PrgEnv-${PE_ENV,,} PrgEnv-{{compilers.pe_env}}'
Expand All @@ -394,4 +428,16 @@ xrds_ats5:
- 'module load {{mpis.name}}/{{mpis.version}}'

env:
GOMP_CPU_AFFINITY: ''
GOMP_CPU_AFFINITY: ''

result_parse:
regex:
triad_once:
regex: '^Triad: *([0-9\.]*) '
action: store
match_select: all
files: '*stream'

result_evaluate:
per_proc_bw: 'sum(triad_once)/len(triad_once)'
total_bw: 'sum(triad_once)'
2 changes: 1 addition & 1 deletion utils/pavparse
Original file line number Diff line number Diff line change
Expand Up @@ -185,7 +185,7 @@ if __name__ == '__main__':
gdf.index.name = index_title
if value_title:
gdf.columns = value_title
grouped_dfs[gtitle] = gdf.squeeze().map(fmt.format)
grouped_dfs[gtitle] = gdf.applymap(fmt.format)
df_mean[gtitle] = gdf.mean().values[0]

plt.legend(plt_legend)
Expand Down

0 comments on commit 4eef1dc

Please sign in to comment.