Add description of problem sizes, weights, etc. for SSNI

lanl · Nov 29, 2023 · 5a2d7f2 · 5a2d7f2
1 parent b1ccd86
commit 5a2d7f2
Show file tree

Hide file tree

Showing 4 changed files with 108 additions and 38 deletions.
diff --git a/doc/sphinx/00_intro/introduction.rst b/doc/sphinx/00_intro/introduction.rst
@@ -180,6 +180,12 @@ Single node benchmarks will require respondent to provide estimates on
 
 * Problem size must be changed to meet % of memory requirements. 
 
+* Respondent shall provide CPU strong scaling and GPU throughput results on current generation representative architectures.
+  If no representative architecture exists respondent can provide modeled / projected CPU strong scaling and GPU throughput results. 
+  respondent may provide both results on current generation representative architectures and modeled / projected architectures. 
+
+* For SSNI projections respondent shall use the specific problem size(s) specified for SSNI.  
+
 Source code modification categories: 
 
 * Baseline: “out-of-the-box” performance
@@ -232,6 +238,53 @@ Where:
 *	w = weighting factor. 
 
 
+
+.. _GlobalSSNIWeightsSizes:
+
+SSNI Weights and SSNI problem sizes
+===================================
+
+
+.. list-table::
+
+ * - **SSNI Benchmark**
+   - **SSNI Weight**
+   - **SSNI Problem size - % device memory**
+ * - Branson
+   - TBD
+   - 30
+ * - AMG2023 Problem 1 Setup
+   - TBD
+   - 20
+ * - AMG2023 Problem 2 Setup
+   - TBD
+   - 20
+ * - AMG2023 Problem 1 Solve
+   - TBD
+   - 20
+ * - AMG2023 Problem 2 Solve
+   - TBD
+   - 20
+ * - MiniEM
+   - TBD
+   - TBD
+ * - MLMD Training
+   - TBD
+   - N/A 
+ * - MLMD Simulation
+   - TBD
+   - 60
+ * - Parthenon-VIBE
+   - TBD
+   - 40 
+ * - Sparta
+   - TBD
+   - TBD
+ * - UMT
+   - TBD
+   - TBD
+
+
 System Information
 ==================
 

diff --git a/doc/sphinx/01_branson/branson.rst b/doc/sphinx/01_branson/branson.rst
@@ -28,9 +28,25 @@ It is in replicated mode which means there is very little MPI communication (end
 
 Figure of Merit
 ---------------
-The Figure of Merit is defined as particles/second and is obtained by dividing the number of particles in the problem divided by the `Total transport` value in the output. Future versions will output this number directly.
+The Figure of Merit is defined as particles/second and is obtained by dividing the number of particles in the problem divided by the `Total transport` value. 
+This value is labeled "Photons Per Second (FOM):" in Branson's output. 
 
 
+Problem Sizes
+-------------
+For strong scaling on a CPU, Branson must be run with three different problem sizes such that the memory
+footprint of all Branson processes at the smallest process count per node is approximately: 4 to 5%, 8 to 10%, and 20 to 22%; during step 2 of the simulation.
+
+
+For throughput curves on a GPU the memory footprint of Branson must vary between ~5% and ~80% in increments of at most 5% of the computational device's main memory.
+
+The memory footprint can be controlled by editing "photons" in the input file.
+
+Results of both CPU strong scaling and GPU throughput should be provided on a representative, current-generation hardware configuration used in benchmarking and projections. 
+Results which are 
+
+See (see :ref:`GlobalSSNIWeightsSizes`) for the problem size for SSNI projection. 
+
 Building
 ========
 
@@ -104,8 +120,7 @@ It is run with:
 
 ..
 
-For strong scaling on a CPU, Branson should be run with three different problem sizes such that the memory
-footprint at the smallest process count per node is approximately: 4 to 5%, 8 to 10%, and 20 to 22%; during step 2 of the simulation.
+
 Memory footprint is the sum of all Branson processes resident set size (or equivalent) on the node.
 This can be obtained on a CPU system using the following (while the application is in step 2):
 
@@ -116,8 +131,7 @@ This can be obtained on a CPU system using the following (while the application
    ps -C BRANSON -o rss | awk '{sum+=$1;} END{print sum/1024/1024;}'
 ..
 
-For throughput curves on a GPU the memory footprint of Branson must vary between ~5% and ~60% in increments of at most 5% of the computational device's main memory.
-The memory footprint can be controlled by editing "photons" in the input file.
+
 
 Results from Branson are provided on the following systems:
 
@@ -128,17 +142,19 @@ Results from Branson are provided on the following systems:
 
 .. _DarwinA100:
 
+AMD Epyc + Nvidia A100 
+----------------------
+
 Dual socket AMD Epyc 7502 with 32 cores operating at 2.5 GHz with 256 GBytes CPU 
 memory and dual Nvidia Ampere A100-SXM4 GPUs with 40GBytes of memory per GPU. 
 
 
-
 Correctness
 ------------
 
 Branson has two main checks on correctness. The first is a looser check that's meant as a "smoke
 test" to see if a code change has introduced an error. After every timestep, a summary block is
-printed sdlfdjskl:
+printed:
 
 .. code-block:: bash
 
@@ -181,6 +197,7 @@ The second check on correctness is much simpler. For any changes to Branson, the
 the same temperature in a standard marshak wave problem after 100 cycles. For the `marshak wave input <https://github.com/lanl/branson/blob/develop/inputs/marshak_wave_replicated.xml>`_ file, the following temperature profile should be reproduced to 3% after 100 cycles, as shown below:
 
 .. code-block:: bash
+
   Step: 100  Start Time: 0.99  End Time: 1  dt: 0.01
   source time: 0.094371
   -------- VERBOSE PRINT BLOCK: CELL TEMPERATURE --------
@@ -211,7 +228,7 @@ the same temperature in a standard marshak wave problem after 100 cycles. For th
             23  0.010000237 0.0099765577 2.3568109e-07
             24  0.010000281 0.0099765314 2.3568212e-07
   -------------------------------------------------------
-
+..
 
 
 This output is expected as long as the spatial, boundary and region blocks are kept the same in the
@@ -256,8 +273,7 @@ figure.
 
    Branson Strong Scaling Performance on Crossroads 66M  particles  
 
-Strong scaling performance of Branson Crossroads 200M Particles is provided within the following table and
-figure.
+Strong scaling performance of Branson Crossroads 200M Particles is provided within the following table and figure.
 
 .. csv-table:: Branson Strong Scaling Performance on Crossroads 200M particles
    :file: cpu_200M.csv
@@ -272,24 +288,24 @@ figure.
 
    Branson Strong Scaling Performance on Crossroads 200M particles
 
-AMD Epyc + Nvidia A100
-------------
 
+AMD Epyc + Nvidia A100
+----------------------
 Throughput performance of Branson on AMD Epyc + Nvidia A100 (using a single GPU) is provided within the
 following table and figure.
 
-.. csv-table::Branson Throughput Performance on AMD Epyc + A100
+.. csv-table:: Branson Throughput Performance on AMD Epyc + Nvidia A100
    :file: gpu.csv
    :align: center
-   :widths: 10, 10
+   :widths: 15, 15
    :header-rows: 1
 
 .. figure:: gpu.png
    :align: center
    :scale: 50%
-   :alt: Branson Throughput Performance on AMD Epyc + A100
+   :alt: Branson Throughput Performance on AMD Epyc + Nvidia A100
 
-   Branson Throughput Performance on AMD Epyc + A100
+   Branson Throughput Performance on AMD Epyc + Nvidia A100
 
 References
 ==========

diff --git a/doc/sphinx/01_branson/gpu.csv b/doc/sphinx/01_branson/gpu.csv
@@ -1,22 +1,22 @@
-No. Particles,Actual
-100000,2.33E+05
-200000,4.32E+05
-300000,5.55E+05
-400000,6.52E+05
-500000,7.14E+05
-600000,7.84E+05
-700000,8.17E+05
-800000,8.40E+05
-900000,8.81E+05
-1000000,9.06E+05
-2000000,9.51E+05
-3000000,8.72E+05
-4000000,8.38E+05
-5000000,7.92E+05
-6600000,7.39E+05
-10000000,6.34E+05
-13300000,5.76E+05
-20000000,5.03E+05
-50000000,3.54E+05
-100000000,2.74E+05
-200000000,2.23E+05
+No. Particles, Actual
+100000, 2.33E+05
+200000, 4.32E+05
+300000, 5.55E+05
+400000, 6.52E+05
+500000, 7.14E+05
+600000, 7.84E+05
+700000, 8.17E+05
+800000, 8.40E+05
+900000, 8.81E+05
+1000000, 9.06E+05
+2000000, 9.51E+05
+3000000, 8.72E+05
+4000000, 8.38E+05
+5000000, 7.92E+05
+6600000, 7.39E+05
+10000000, 6.34E+05
+13300000, 5.76E+05
+20000000, 5.03E+05
+50000000, 3.54E+05
+100000000, 2.74E+05
+200000000, 2.23E+05
diff --git a/doc/sphinx/09_Microbenchmarks/M1_STREAM/STREAM.rst b/doc/sphinx/09_Microbenchmarks/M1_STREAM/STREAM.rst
@@ -98,6 +98,7 @@ At capacity, the measured values should reach a steady state where increasing th
 For Crossroads, the benchmark was build with ``STREAM_ARRAY_SIZE=40000000`` and ``NTIMES=20`` with optmizations and OpenMP enabled.
 
 .. code-block:: bash
+  
    make CC=`which mpicc` FF=`which mpifort` CFLAGS="-O2 -fopenmp -DSTREAM_ARRAY_SIZE=40000000 -DNTIMES=20" FFLAGS="-O2 -fopenmp -DSTREAM_ARRAY_SIZE=40000000 -DNTIMES=20"
Original file line number	Diff line number	Diff line change
Expand Up		@@ -98,6 +98,7 @@ At capacity, the measured values should reach a steady state where increasing th
		For Crossroads, the benchmark was build with ``STREAM_ARRAY_SIZE=40000000`` and ``NTIMES=20`` with optmizations and OpenMP enabled.

		.. code-block:: bash

		make CC=`which mpicc` FF=`which mpifort` CFLAGS="-O2 -fopenmp -DSTREAM_ARRAY_SIZE=40000000 -DNTIMES=20" FFLAGS="-O2 -fopenmp -DSTREAM_ARRAY_SIZE=40000000 -DNTIMES=20"


Expand Down