diff --git a/doc/sphinx/02_amg/amg.rst b/doc/sphinx/02_amg/amg.rst index 8a4d74f4..93ae210e 100644 --- a/doc/sphinx/02_amg/amg.rst +++ b/doc/sphinx/02_amg/amg.rst @@ -9,7 +9,7 @@ Purpose ======= The AMG2023 benchmark consists of a driver (amg.c), a simple Makefile, and documentation. It is available at https://github.com/LLNL/AMG2023 . -It requires an installation of hypre 2.27.0. +It requires an installation of hypre 2.27.0 or higher. It uses hypre's parallel algebraic multigrid (AMG) solver BoomerAMG in combination with a Krylov solver to solve two linear systems arising from diffusion problems on a cuboid discretized by finite differences. The problems are set up through hypre's linear-algebraic IJ interface. The problem sizes can be controlled from the command line.`. @@ -30,10 +30,11 @@ The problem sizes for both problems can be set by the user from the command line Figure of Merit --------------- -The figures of merit (FOM_setup and FOM_solve) are calculated using the total number of nonzeros for all system matrices and interpolation operators on all levels of AMG, AMG setup wall clock time (FOM_setup), and AMG solve phase time and number of iterations (FOM_solve). -FOM_setup and FOM_solve are qualitatively different. FOM_solve represents a number in the order of the throughput of an average iteration in the solve phase, i.e., approximately flops times a constant. FOM_setup is more complex, since the setup phase also contains many integer computations and if statements to generate data structures, determine neighbor processes, etc. All of this is already available in the solve phase. For this reason, we will report both below. +The figure of merit (FOM) is calculated using the total number of nonzeros for all system matrices and interpolation operators on all levels of AMG (NNZ), AMG setup wall clock time (Setup_time), and AMG solve phase wall clock time (Solve_time). +Since in time dependent problems the AMG preconditioner might be used for several solves before it needs to be reevaluated, a parameter k has also been included to simulate computation of a time dependent problem that reuses the preconditioner for an average of k time steps. -The total FOM is evaluated as follows: FOM = (FOM_setup + FOM_solve)/2. +The total FOM is evaluated as follows: FOM = NNZ / (Setup_time + k * Solve_time). +The parameter k is set to 1 in Problem 1 and to 3 in Problem 2. Building ======== @@ -337,7 +338,7 @@ V-100 ----- We have also performed runs on 1 NVIDIA V-100 GPU increasing the problem size n x n x n. -For these runs hypre 2.27.0 was configured as follows: +For these runs hypre 2.29.0 was configured as follows: ``configure --with-cuda`` @@ -345,102 +346,6 @@ We increased n by 10 starting with n=50 for Problem 1 and with n=80 for Problem Note that Problem 2 uses much less memory, since the original matrix has at most 7 coefficients per row vs 27 for Problem 1. In addition, aggressive coarsening is used on the first level, significantly decreasing memory usage at the cost of increased number of iterations. -.. table:: FOMs, times and number of iterations for Problem 1 with grid size n x n x n on 1 V-100 - - +---------+-----------+-----------+-----------+------------+------------+------------+ - | n | FOM | FOM_setup | FOM_solve | setup time | solve time | iterations | - +---------+-----------+-----------+-----------+------------+------------+------------+ - | 50 | 2.701E+09 | 9.708E+07 | 5.304E+09 | 0.068 | 0.024 | 19 | - +---------+-----------+-----------+-----------+------------+------------+------------+ - | 60 | 3.654E+09 | 1.348E+08 | 7.172E+09 | 0.086 | 0.031 | 19 | - +---------+-----------+-----------+-----------+------------+------------+------------+ - | 70 | 4.745E+09 | 1.504E+08 | 9.340E+09 | 0.123 | 0.038 | 19 | - +---------+-----------+-----------+-----------+------------+------------+------------+ - | 80 | 4.582E+09 | 2.000E+08 | 8.964E+09 | 0.139 | 0.059 | 19 | - +---------+-----------+-----------+-----------+------------+------------+------------+ - | 90 | 5.987E+09 | 2.190E+08 | 1.176E+10 | 0.181 | 0.064 | 19 | - +---------+-----------+-----------+-----------+------------+------------+------------+ - | 100 | 6.574E+09 | 2.702E+08 | 1.288E+10 | 0.202 | 0.080 | 19 | - +---------+-----------+-----------+-----------+------------+------------+------------+ - | 110 | 6.856E+09 | 3.026E+08 | 1.341E+10 | 0.240 | 0.103 | 19 | - +---------+-----------+-----------+-----------+------------+------------+------------+ - | 120 | 7.181E+09 | 3.359E+08 | 1.403E+10 | 0.281 | 0.128 | 19 | - +---------+-----------+-----------+-----------+------------+------------+------------+ - | 130 | 7.377E+09 | 3.709E+08 | 1.438E+10 | 0.324 | 0.159 | 19 | - +---------+-----------+-----------+-----------+------------+------------+------------+ - | 140 | 7.425E+09 | 3.907E+08 | 1.446E+10 | 0.385 | 0.198 | 19 | - +---------+-----------+-----------+-----------+------------+------------+------------+ - | 150 | 7.630E+09 | 4.108E+08 | 1.485E+10 | 0.451 | 0.237 | 19 | - +---------+-----------+-----------+-----------+------------+------------+------------+ - | 160 | 7.738E+09 | 4.255E+08 | 1.505E+10 | 0.528 | 0.284 | 19 | - +---------+-----------+-----------+-----------+------------+------------+------------+ - | 170 | 7.812E+09 | 4.372E+08 | 1.519E+10 | 0.617 | 0.338 | 19 | - +---------+-----------+-----------+-----------+------------+------------+------------+ - | 180 | 7.878E+09 | 4.429E+08 | 1.531E+10 | 0.724 | 0.398 | 19 | - +---------+-----------+-----------+-----------+------------+------------+------------+ - | 190 | 7.895E+09 | 4.526E+08 | 1.534E+10 | 0.834 | 0.468 | 19 | - +---------+-----------+-----------+-----------+------------+------------+------------+ - | 200 | 7.957E+09 | 4.593E+08 | 1.546E+10 | 0.959 | 0.542 | 19 | - +---------+-----------+-----------+-----------+------------+------------+------------+ - - -.. table:: FOMs, times and number of iterations for Problem 2 with grid size n x n x n on 1 V-100 - - +---------+-----------+-----------+-----------+------------+------------+------------+ - | n | FOM | FOM_setup | FOM_solve | setup time | solve time | iterations | - +---------+-----------+-----------+-----------+------------+------------+------------+ - | 80 | 2.669E+09 | 5.841E+07 | 5.280E+09 | 0.096 | 0.032 | 30 | - +---------+-----------+-----------+-----------+------------+------------+------------+ - | 90 | 3.063E+09 | 6.953E+07 | 6.057E+09 | 0.115 | 0.038 | 29 | - +---------+-----------+-----------+-----------+------------+------------+------------+ - | 100 | 3.481E+09 | 8.562E+07 | 6.876E+09 | 0.135 | 0.047 | 30 | - +---------+-----------+-----------+-----------+------------+------------+------------+ - | 110 | 3.831E+09 | 9.717E+07 | 7.564E+09 | 0.153 | 0.060 | 31 | - +---------+-----------+-----------+-----------+------------+------------+------------+ - | 120 | 3.693E+09 | 1.068E+08 | 7.279E+09 | 0.178 | 0.081 | 31 | - +---------+-----------+-----------+-----------+------------+------------+------------+ - | 130 | 4.375E+09 | 1.126E+08 | 8.636E+09 | 0.215 | 0.087 | 31 | - +---------+-----------+-----------+-----------+------------+------------+------------+ - | 140 | 4.547E+09 | 1.284E+08 | 8.967E+09 | 0.236 | 0.105 | 31 | - +---------+-----------+-----------+-----------+------------+------------+------------+ - | 150 | 4.753E+09 | 1.448E+08 | 9.361E+09 | 0.257 | 0.127 | 32 | - +---------+-----------+-----------+-----------+------------+------------+------------+ - | 160 | 4.879E+09 | 1.598E+08 | 9.600E+09 | 0.273 | 0.150 | 32 | - +---------+-----------+-----------+-----------+------------+------------+------------+ - | 170 | 4.985E+09 | 1.685E+08 | 9.801E+09 | 0.322 | 0.183 | 33 | - +---------+-----------+-----------+-----------+------------+------------+------------+ - | 180 | 5.094E+09 | 1.702E+08 | 1.001E+10 | 0.366 | 0.213 | 33 | - +---------+-----------+-----------+-----------+------------+------------+------------+ - | 190 | 5.158E+09 | 1.874E+08 | 1.013E+10 | 0.405 | 0.247 | 33 | - +---------+-----------+-----------+-----------+------------+------------+------------+ - | 200 | 5.191E+09 | 1.996E+08 | 1.018E+10 | 0.444 | 0.287 | 33 | - +---------+-----------+-----------+-----------+------------+------------+------------+ - | 210 | 5.239E+09 | 2.071E+08 | 1.027E+10 | 0.495 | 0.330 | 33 | - +---------+-----------+-----------+-----------+------------+------------+------------+ - | 220 | 5.185E+09 | 2.123E+08 | 1.016E+10 | 0.556 | 0.383 | 33 | - +---------+-----------+-----------+-----------+------------+------------+------------+ - | 230 | 5.173E+09 | 2.176E+08 | 1.013E+10 | 0.620 | 0.453 | 34 | - +---------+-----------+-----------+-----------+------------+------------+------------+ - | 240 | 5.148E+09 | 2.227E+08 | 1.007E+10 | 0.688 | 0.517 | 34 | - +---------+-----------+-----------+-----------+------------+------------+------------+ - | 250 | 5.139E+09 | 2.285E+08 | 1.005E+10 | 0.758 | 0.586 | 34 | - +---------+-----------+-----------+-----------+------------+------------+------------+ - | 260 | 5.168E+09 | 2.293E+08 | 1.011E+10 | 0.850 | 0.656 | 34 | - +---------+-----------+-----------+-----------+------------+------------+------------+ - | 270 | 5.173E+09 | 2.311E+08 | 1.012E+10 | 0.945 | 0.756 | 35 | - +---------+-----------+-----------+-----------+------------+------------+------------+ - | 280 | 5.198E+09 | 2.356E+08 | 1.016E+10 | 1.034 | 0.839 | 35 | - +---------+-----------+-----------+-----------+------------+------------+------------+ - | 290 | 5.221E+09 | 2.382E+08 | 1.020E+10 | 1.137 | 0.929 | 35 | - +---------+-----------+-----------+-----------+------------+------------+------------+ - | 300 | 5.230E+09 | 2.419E+08 | 1.022E+10 | 1.239 | 1.027 | 35 | - +---------+-----------+-----------+-----------+------------+------------+------------+ - | 310 | 5.246E+09 | 2.435E+08 | 1.025E+10 | 1.359 | 1.130 | 35 | - +---------+-----------+-----------+-----------+------------+------------+------------+ - | 320 | 5.255E+09 | 2.447E+08 | 1.027E+10 | 1.487 | 1.241 | 35 | - +---------+-----------+-----------+-----------+------------+------------+------------+ - - The FOMs of AMG2023 on V100 for Problem 1 is provided in the following table and figure: .. csv-table:: AMG2023 FOM on V100 for Problem 1 (27-pt stencil, AMG-GMRES) diff --git a/doc/sphinx/02_amg/gpu1.csv b/doc/sphinx/02_amg/gpu1.csv index 67011e19..a70de5dc 100644 --- a/doc/sphinx/02_amg/gpu1.csv +++ b/doc/sphinx/02_amg/gpu1.csv @@ -1,17 +1,17 @@ n,FOM -50,2.701E+09 -60,3.654E+09 -70,4.745E+09 -80,4.582E+09 -90,5.987E+09 -100,6.574E+09 -110,6.856E+09 -120,7.181E+09 -130,7.377E+09 -140,7.425E+09 -150,7.630E+09 -160,7.738E+09 -170,7.812E+09 -180,7.878E+09 -190,7.895E+09 -200,7.957E+09 +50,7.135E+07 +60,9.863E+07 +70,1.199E+08 +80,1.397E+08 +90,1.629E+08 +100,1.858E+08 +110,2.104E+08 +120,2.317E+08 +130,2.496E+08 +140,2.597E+08 +150,2.668E+08 +160,2.733E+08 +170,2.827E+08 +180,2.858E+08 +190,2.890E+08 +200,2.925E+08 diff --git a/doc/sphinx/02_amg/gpu2.csv b/doc/sphinx/02_amg/gpu2.csv index 28e01245..ebfcaa2b 100644 --- a/doc/sphinx/02_amg/gpu2.csv +++ b/doc/sphinx/02_amg/gpu2.csv @@ -1,26 +1,26 @@ n,FOM -80,2.669E+09 -90,3.063E+09 -100,3.481E+09 -110,3.831E+09 -120,3.693E+09 -130,4.375E+09 -140,4.547E+09 -150,4.753E+09 -160,4.879E+09 -170,4.985E+09 -180,5.094E+09 -190,5.158E+09 -200,5.191E+09 -210,5.239E+09 -220,5.185E+09 -230,5.173E+09 -240,5.148E+09 -250,5.139E+09 -260,5.168E+09 -270,5.173E+09 -280,5.198E+09 -290,5.221E+09 -300,5.230E+09 -310,5.246E+09 -320,5.255E+09 +80,2.931E+07 +90,3.493E+07 +100,4.070E+07 +110,4.437E+07 +120,4.511E+07 +130,5.104E+07 +140,5.510E+07 +150,5.842E+07 +160,6.075E+07 +170,6.276E+07 +180,6.530E+07 +190,6.652E+07 +200,6.823E+07 +210,6.903E+07 +220,6.949E+07 +230,6.821E+07 +240,6.856E+07 +250,6.871E+07 +260,6.910E+07 +270,6.801E+07 +280,6.825E+07 +290,6.916E+07 +300,6.932E+07 +310,6.955E+07 +320,6.978E+07 diff --git a/doc/sphinx/02_amg/mem.gp b/doc/sphinx/02_amg/mem.gp index 41d016a4..10604856 100644 --- a/doc/sphinx/02_amg/mem.gp +++ b/doc/sphinx/02_amg/mem.gp @@ -9,7 +9,7 @@ set ylabel "FOM" set xrange [10:40] set key left top -set yrange [1.05e+8: 1.75e+8] +set yrange [1.0e+6: 1.0e+7] set grid show grid diff --git a/doc/sphinx/02_amg/roci_1_120.csv b/doc/sphinx/02_amg/roci_1_120.csv index 493111f6..87a8fd17 100644 --- a/doc/sphinx/02_amg/roci_1_120.csv +++ b/doc/sphinx/02_amg/roci_1_120.csv @@ -1,8 +1,8 @@ No. cores,Actual,Ideal -1,1.6281E+08,1.6281E+08 -2,3.1584E+08,3.2562E+08 -4,5.6892E+08,6.5124E+08 -8,1.2243E+09,1.3025E+09 -16,2.1890E+09,2.6050E+09 -32,2.9544E+09,5.2099E+09 -64,4.5612E+09,1.0420E+10 +1,8.6459E+06,8.6459E+06 +2,1.4987E+07,1.7292E+07 +4,2.9222E+07,3.4583E+07 +8,5.5766E+07,6.9167E+07 +16,9.5407E+07,1.3833E+08 +32,1.4029E+08,2.7667E+08 +64,2.2887E+08,5.5333E+08 diff --git a/doc/sphinx/02_amg/roci_1_160.csv b/doc/sphinx/02_amg/roci_1_160.csv index 5c5e443f..d827cb6d 100644 --- a/doc/sphinx/02_amg/roci_1_160.csv +++ b/doc/sphinx/02_amg/roci_1_160.csv @@ -1,8 +1,8 @@ No. cores,Actual,Ideal -1,1.6000E+08,1.6000E+08 -2,3.0493E+08,3.2000E+08 -4,5.6314E+08,6.4000E+08 -8,1.1901E+09,1.2800E+09 -16,2.1349E+09,2.5600E+09 -32,2.9680E+09,5.1200E+09 -64,4.7134E+09,1.0240E+10 +1,8.4644E+06,8.4644E+06 +2,1.2983E+07,1.6929E+07 +4,2.7064E+07,3.3857E+07 +8,5.0436E+07,6.7715E+07 +16,1.0227E+08,1.3543E+08 +32,1.3856E+08,2.7086E+08 +64,2.3692E+08,5.4172E+08 diff --git a/doc/sphinx/02_amg/roci_1_200.csv b/doc/sphinx/02_amg/roci_1_200.csv index 92a84ffc..b9b4cde4 100644 --- a/doc/sphinx/02_amg/roci_1_200.csv +++ b/doc/sphinx/02_amg/roci_1_200.csv @@ -1,8 +1,8 @@ No. cores,Actual,Ideal -1,1.5948E+08,1.5948E+08 -2,2.9978E+08,3.1896E+08 -4,5.4708E+08,6.3792E+08 -8,1.0820E+09,1.2758E+09 -16,2.0509E+09,2.5517E+09 -32,2.8251E+09,5.1034E+09 -64,4.6217E+09,1.0207E+10 +1,8.4267E+06,8.4267E+06 +2,1.2526E+07,1.6853E+07 +4,2.4576E+07,3.3707E+07 +8,5.0598E+07,6.7413E+07 +16,9.3217E+07,1.3483E+08 +32,1.2682E+08,2.6965E+08 +64,2.3377E+08,5.3931E+08 diff --git a/doc/sphinx/02_amg/roci_2_200.csv b/doc/sphinx/02_amg/roci_2_200.csv index 580df8f3..c7dba460 100644 --- a/doc/sphinx/02_amg/roci_2_200.csv +++ b/doc/sphinx/02_amg/roci_2_200.csv @@ -1,8 +1,8 @@ No. cores,Actual,Ideal -1,1.1020E+08,1.1020E+08 -2,2.0493E+08,2.2040E+08 -4,3.8499E+08,4.4080E+08 -8,7.9992E+08,8.8160E+08 -16,1.2667E+09,1.7632E+09 -32,1.7586E+09,3.5264E+09 -64,2.9247E+09,7.0528E+09 +1,1.7980E+06,1.7980E+06 +2,3.2662E+06,3.5961E+06 +4,6.2277E+06,7.1922E+06 +8,1.2149E+07,1.4384E+07 +16,2.2796E+07,2.8769E+07 +32,2.8885E+07,5.7538E+07 +64,4.6850E+07,1.1508E+08 diff --git a/doc/sphinx/02_amg/roci_2_256.csv b/doc/sphinx/02_amg/roci_2_256.csv index 16a84e25..10fffd1c 100644 --- a/doc/sphinx/02_amg/roci_2_256.csv +++ b/doc/sphinx/02_amg/roci_2_256.csv @@ -1,8 +1,8 @@ No. cores,Actual,Ideal -1,1.0864E+08,1.0864E+08 -2,1.9807E+08,2.1728E+08 -4,3.7525E+08,4.3456E+08 -8,7.3751E+08,8.6912E+08 -16,1.3348E+09,1.7382E+09 -32,1.7869E+09,3.4765E+09 -64,2.8334E+09,6.9530E+09 +1,1.7267E+06,1.7267E+06 +2,3.0559E+06,3.4535E+06 +4,5.8681E+06,6.9069E+06 +8,1.1919E+07,1.3814E+07 +16,2.0471E+07,2.7628E+07 +32,2.7253E+07,5.5255E+07 +64,4.5270E+07,1.1051E+08 diff --git a/doc/sphinx/02_amg/roci_2_320.csv b/doc/sphinx/02_amg/roci_2_320.csv index 18cb7a26..e8f5e29c 100644 --- a/doc/sphinx/02_amg/roci_2_320.csv +++ b/doc/sphinx/02_amg/roci_2_320.csv @@ -1,8 +1,8 @@ No. cores,Actual,Ideal -1,1.0890E+08,1.0890E+08 -2,1.8653E+08,2.1780E+08 -4,3.6953E+08,4.3560E+08 -8,7.2172E+08,8.7120E+08 -16,1.3751E+09,1.7424E+09 -32,1.7869E+09,3.4848E+09 -64,2.8376E+09,6.9696E+09 +1,1.6485E+06,1.6485E+06 +2,2.8577E+06,3.2970E+06 +4,5.3917E+06,6.5940E+06 +8,1.1154E+07,1.3188E+07 +16,2.1099E+07,2.6376E+07 +32,2.6207E+07,5.2752E+07 +64,4.2568E+07,1.0550E+08 diff --git a/doc/sphinx/02_amg/roci_mem.csv b/doc/sphinx/02_amg/roci_mem.csv index 4683ba1a..8497ef14 100644 --- a/doc/sphinx/02_amg/roci_mem.csv +++ b/doc/sphinx/02_amg/roci_mem.csv @@ -1,5 +1,5 @@ GB,Problem 1,Problem 2 -10,1.6469E+08,1.1119E+08 -20,1.6157E+08,1.1076E+08 -30,1.6042E+08,1.1016E+08 -40,1.5985E+08,1.1010E+08 +10,8.6128E+06,1.8494E+06 +20,8.4654E+06,1.7930E+06 +30,8.4068E+06,1.7416E+06 +40,8.3174E+06,1.7258E+06