Skip to content

Commit

Permalink
Microbenchmark proofread suggestions (#68)
Browse files Browse the repository at this point in the history
* STREAM suggestions

* Added purpose for OSUMB

* Run Rules for OSUMB updated to N/A

* DGEMM typo in Example Results section

* Updated Makefile for DGEMM and make command in documentation
  • Loading branch information
JDTruj2018 authored Nov 20, 2023
1 parent 56a3e2a commit 39a9e2a
Show file tree
Hide file tree
Showing 4 changed files with 21 additions and 10 deletions.
14 changes: 11 additions & 3 deletions doc/sphinx/09_Microbenchmarks/M1_STREAM/STREAM.rst
Original file line number Diff line number Diff line change
Expand Up @@ -95,18 +95,26 @@ This is the minimum size unless other system attributes constrain it.
The array size only influences the capacity of STREAM to fully load the memory bus.
At capacity, the measured values should reach a steady state where increasing the value of ``STREAM_ARRAY_SIZE`` doesn't influence the measurement for a certain number of processors.

For Crossroads, the benchmark was build with ``STREAM_ARRAY_SIZE=40000000`` and ``NTIMES=20`` with optmizations and OpenMP enabled.

.. code-block:: bash
make CC=`which mpicc` FF=`which mpifort` CFLAGS="-O2 -fopenmp -DSTREAM_ARRAY_SIZE=40000000 -DNTIMES=20" FFLAGS="-O2 -fopenmp -DSTREAM_ARRAY_SIZE=40000000 -DNTIMES=20"
Running
=======

.. code-block:: bash
srun -n <num_processes> ./stream
export OMP_NUM_THREADS=1
srun -n <num_processes> --cpu-bind=core ./stream-mpi.exe
Replace `<num_processes>` with the number of MPI processes you want to use. For example, if you want to use 4 MPI processes, the command will be:

.. code-block:: bash
srun -n 4 ./stream
export OMP_NUM_THREADS=1
srun -n 4 --cpu-bind=core ./stream-mpi.exe
Example Results
===============
Expand All @@ -121,7 +129,7 @@ Crossroads
These results were obtained using the cce v15.0.1 compiler and cray-mpich v 8.1.25.
Results using the intel-oneapi and intel-classic v2023.1.0 and the same cray-mpich were also collected; cce performed the best.

``STREAM_ARRAY_SIZE=40 NTIMES=20``
``STREAM_ARRAY_SIZE=40000000 NTIMES=20``

.. csv-table:: STREAM microbenchmark bandwidth measurement
:file: stream-xrds_ats5cce-cray-mpich.csv
Expand Down
6 changes: 5 additions & 1 deletion doc/sphinx/09_Microbenchmarks/M3_OSUMB/OSUMB.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,6 +5,8 @@ OSU Microbenchmarks
Purpose
=======

The OSU Microbenchmarks (OMB) are widely used to measure and evaluate the performance of MPI operations for point-to-oiint, multi-pair, collective, and one-sided communications.

Characteristics
===============

Expand All @@ -18,6 +20,8 @@ The OSU benchmarks are a suite of microbenchmarks designed to measure network ch
Run Rules
---------

N/A

Building
========

Expand Down Expand Up @@ -76,4 +80,4 @@ Crossroads
:file: OSU_ats3_results.csv
:align: center
:widths: 10, 10, 10, 10, 10
:header-rows: 1
:header-rows: 1
6 changes: 3 additions & 3 deletions doc/sphinx/09_Microbenchmarks/M5_DGEMM/DGEMM.rst
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ Makefiles are provided for the intel and gcc compilers. Before building, load th
cd src
patch -p1 < ../dgemm_omp_fixes.patch
make
make CFLAGS=-I<openblas_include_dir>
..
Expand Down Expand Up @@ -84,7 +84,7 @@ These are positional arguments, so, for instance, R cannot be set without settin
Example Results
===============

Results from Branson are provided on the following systems:
Results from DGEMM are provided on the following systems:

* Crossroads (see :ref:`GlobalSystemATS3`)

Expand All @@ -102,4 +102,4 @@ This test was built with the intel 2023.1.0 compiler using the crayOS compiler w
.. figure:: dgemm_ats3.png
:align: center
:scale: 50%
:alt: DGEMM microbenchmark FLOPs measurement
:alt: DGEMM microbenchmark FLOPs measurement
5 changes: 2 additions & 3 deletions microbenchmarks/dgemm/src/Makefile
Original file line number Diff line number Diff line change
@@ -1,11 +1,10 @@

CC=gcc
CFLAGS=-ffast-math -mavx2 -ftree-vectorizer-verbose=3 -O3 -fopenmp -DUSE_CBLAS
CFLAGS+=-ffast-math -mavx2 -ftree-vectorizer-verbose=3 -O3 -fopenmp -DUSE_CBLAS
LDFLAGS=-L${OPENBLAS_ROOT}/lib -lopenblas

mt-dgemm: mt-dgemm.c
$(CC) $(CFLAGS) -o mt-dgemm mt-dgemm.c
$(CC) $(CFLAGS) $(LDFLAGS) -o mt-dgemm mt-dgemm.c

clean:
rm mt-dgemm *.o

0 comments on commit 39a9e2a

Please sign in to comment.