Skip to content

Commit

Permalink
Merge branch 'develop' into lroberts36/add-sparse-vector-wave-test
Browse files Browse the repository at this point in the history
  • Loading branch information
lroberts36 authored Oct 15, 2024
2 parents 14fa3a4 + 1e77c13 commit 8181be8
Show file tree
Hide file tree
Showing 181 changed files with 7,158 additions and 2,571 deletions.
58 changes: 58 additions & 0 deletions .github/workflows/ci-extended.yml
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,8 @@ env:
CMAKE_BUILD_PARALLEL_LEVEL: 5 # num threads for build
MACHINE_CFG: cmake/machinecfg/CI.cmake
OMPI_MCA_mpi_common_cuda_event_max: 1000
# https://github.com/open-mpi/ompi/issues/4948#issuecomment-395468231
OMPI_MCA_btl_vader_single_copy_mechanism: none

jobs:
perf-and-regression:
Expand Down Expand Up @@ -121,3 +123,59 @@ jobs:
example/advection/ascent_render_57.png
retention-days: 3

perf-and-regression-amdgpu:
strategy:
matrix:
parallel: ['serial', 'mpi']
runs-on: [self-hosted, navi1030]
container:
image: ghcr.io/parthenon-hpc-lab/rocm5.4.3-mpi-hdf5
# Map to local user id on CI machine to allow writing to build cache and
# forward device handles to access AMD GPU within container
options: --user 1000 -w /home/ci --device /dev/kfd --device /dev/dri --security-opt seccomp=unconfined
env:
CMAKE_GENERATOR: Ninja
CMAKE_BUILD_PARALLEL_LEVEL: 8 # num threads for build
steps:
- uses: actions/checkout@v3
with:
submodules: 'true'

- name: Setup cache for gold standard
uses: actions/cache@v3
with:
path: tst/regression/gold_standard/
key: gold-standard

- name: Configure
run: |
cmake -B build \
-DMACHINE_CFG=${PWD}/cmake/machinecfg/GitHubActions.cmake \
-DCMAKE_BUILD_TYPE=Release \
-DMACHINE_VARIANT=hip-${{ matrix.parallel }} \
-DCMAKE_CXX_COMPILER=hipcc
- name: Build
run: cmake --build build

# run performance "unit" tests (none use MPI)
- name: Performance tests
if: ${{ matrix.parallel == 'serial' }}
run: |
cd build
ctest -L performance -LE perf-reg
# run regression tests
- name: Regression tests
run: |
cd build
ctest -L regression -L ${{ matrix.parallel }} -LE perf-reg --timeout 3600
- uses: actions/upload-artifact@v3
with:
name: log-and-convergence-${{ matrix.parallel }}
path: |
build/CMakeFiles/CMakeOutput.log
build/tst/regression/outputs/advection_convergence*/advection-errors.dat
build/tst/regression/outputs/advection_convergence*/advection-errors.png
retention-days: 3
43 changes: 43 additions & 0 deletions .github/workflows/ci-short.yml
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,8 @@ env:
CMAKE_BUILD_PARALLEL_LEVEL: 5 # num threads for build
MACHINE_CFG: cmake/machinecfg/CI.cmake
OMPI_MCA_mpi_common_cuda_event_max: 1000
# https://github.com/open-mpi/ompi/issues/4948#issuecomment-395468231
OMPI_MCA_btl_vader_single_copy_mechanism: none

jobs:
style:
Expand Down Expand Up @@ -130,3 +132,44 @@ jobs:
build/profile.txt
retention-days: 3

integration-amdgpu:
runs-on: [self-hosted, navi1030]
container:
image: ghcr.io/parthenon-hpc-lab/rocm5.4.3-mpi-hdf5
# Map to local user id on CI machine to allow writing to build cache and
# forward device handles to access AMD GPU within container
options: --user 1000 -w /home/ci --device /dev/kfd --device /dev/dri --security-opt seccomp=unconfined
env:
CMAKE_GENERATOR: Ninja
CMAKE_BUILD_PARALLEL_LEVEL: 8 # num threads for build
steps:
- uses: actions/checkout@v3
with:
submodules: 'true'
- name: Configure
run: |
cmake -B build \
-DMACHINE_CFG=${PWD}/cmake/machinecfg/GitHubActions.cmake \
-DCMAKE_BUILD_TYPE=Release \
-DMACHINE_VARIANT=hip-mpi \
-DCMAKE_CXX_COMPILER=hipcc
# Test example with "variables" and output
- name: advection
run: |
cmake --build build -t advection-example
cd build
ctest -R regression_mpi_test:output_hdf5
# Test example with swarms
- name: particle-leapfrog
run: |
cmake --build build -t particle-leapfrog
cd build
ctest -R regression_mpi_test:particle_leapfrog
- uses: actions/upload-artifact@v3
with:
name: configure-log-integration-amdgpu
path: |
build/CMakeFiles/CMakeOutput.log
retention-days: 3

6 changes: 3 additions & 3 deletions .github/workflows/docs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -23,9 +23,9 @@ jobs:
run: export DEBIAN_FRONTEND=noninteractive
- name: install dependencies
run: |
pip install sphinx
pip install sphinx-rtd-theme
pip install sphinx-multiversion
pip install --break-system-packages sphinx
pip install --break-system-packages sphinx-rtd-theme
pip install --break-system-packages sphinx-multiversion
- name: build docs
run: |
echo "Repo = ${GITHUB_REPOSITORY}"
Expand Down
51 changes: 51 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,45 @@
## Current develop

### Added (new features/APIs/variables/...)
- [[PR 1185]](https://github.com/parthenon-hpc-lab/parthenon/pull/1185/files) Bugfix to particle defragmentation
- [[PR 1184]](https://github.com/parthenon-hpc-lab/parthenon/pull/1184) Fix swarm block neighbor indexing in 1D, 2D
- [[PR 1183]](https://github.com/parthenon-hpc-lab/parthenon/pull/1183) Fix particle leapfrog example initialization data
- [[PR 1179]](https://github.com/parthenon-hpc-lab/parthenon/pull/1179) Make a global variable for whether simulation is a restart
- [[PR 1171]](https://github.com/parthenon-hpc-lab/parthenon/pull/1171) Add PARTHENON_USE_SYSTEM_PACKAGES build option
- [[PR 1161]](https://github.com/parthenon-hpc-lab/parthenon/pull/1161) Make flux field Metadata accessible, add Metadata::CellMemAligned flag, small perfomance upgrades

### Changed (changing behavior/API/variables/...)
- [[PR 1187]](https://github.com/parthenon-hpc-lab/parthenon/pull/1187) Make DataCollection::Add safer and generalize MeshBlockData::Initialize
- [[PR 1186]](https://github.com/parthenon-hpc-lab/parthenon/pull/1186) Bump Kokkos submodule to 4.4.1
- [[PR 1171]](https://github.com/parthenon-hpc-lab/parthenon/pull/1171) Add PARTHENON_USE_SYSTEM_PACKAGES build option
- [[PR 1172]](https://github.com/parthenon-hpc-lab/parthenon/pull/1172) Make parthenon manager robust against external MPI init and finalize calls

### Fixed (not changing behavior/API/variables/...)
- [[PR 1178]](https://github.com/parthenon-hpc-lab/parthenon/pull/1178) Fix issue with mesh pointer when using relative residual tolerance in BiCGSTAB solver.
- [[PR 1173]](https://github.com/parthenon-hpc-lab/parthenon/pull/1173) Make debugging easier by making parthenon throw an error if ParameterInput is different on multiple MPI ranks.

### Infrastructure (changes irrelevant to downstream codes)
- [[PR 1176]](https://github.com/parthenon-hpc-lab/parthenon/pull/1176) Move some code from header to implementation files

### Removed (removing behavior/API/varaibles/...)


### Incompatibilities (i.e. breaking changes)


## Release 24.08
Date: 2024-08-30

### Added (new features/APIs/variables/...)
- [[PR 1167]](https://github.com/parthenon-hpc-lab/parthenon/pull/1167) Store block gid and neighbor refinement levels in sparse packs
- [[PR 1151]](https://github.com/parthenon-hpc-lab/parthenon/pull/1151) Add time offset `c` to LowStorageIntegrator
- [[PR 1147]](https://github.com/parthenon-hpc-lab/parthenon/pull/1147) Add `par_reduce_inner` functions
- [[PR 1159]](https://github.com/parthenon-hpc-lab/parthenon/pull/1159) Add additional timestep controllers in parthenon/time.
- [[PR 1148]](https://github.com/parthenon-hpc-lab/parthenon/pull/1148) Add `GetPackDimension` to `StateDescriptor` for calculating pack sizes before `Mesh` initialization
- [[PR 1143]](https://github.com/parthenon-hpc-lab/parthenon/pull/1143) Add tensor indices to VariableState, add radiation constant to constants, add TypeLists, allow for arbitrary containers for solvers
- [[PR 1140]](https://github.com/parthenon-hpc-lab/parthenon/pull/1140) Allow for relative convergence tolerance in BiCGSTAB solver.
- [[PR 1047]](https://github.com/parthenon-hpc-lab/parthenon/pull/1047) General three- and four-valent 2D forests w/ arbitrary orientations.
- [[PR 1130]](https://github.com/parthenon-hpc-lab/parthenon/pull/1130) Enable `parthenon::par_reduce` for MD loops with Kokkos 1D Range
- [[PR 1119]](https://github.com/parthenon-hpc-lab/parthenon/pull/1119) Formalize MeshData partitioning.
- [[PR 1128]](https://github.com/parthenon-hpc-lab/parthenon/pull/1128) Add cycle and nbtotal to hst
- [[PR 1099]](https://github.com/parthenon-hpc-lab/parthenon/pull/1099) Functionality for outputting task graphs in GraphViz format.
Expand All @@ -25,12 +64,22 @@
- [[PR 1019]](https://github.com/parthenon-hpc-lab/parthenon/pull/1019) Enable output for non-cell-centered variables

### Changed (changing behavior/API/variables/...)
- [[PR 1153]](https://github.com/parthenon-hpc-lab/parthenon/pull/1153) Allow base grid with fewer blocks than ranks before initial AMR
- [[PR 1105]](https://github.com/parthenon-hpc-lab/parthenon/pull/1105) Refactor parameter input for linear solvers
- [[PR 1078]](https://github.com/parthenon-hpc-lab/parthenon/pull/1078) Add reduction fallback in 1D. Add IndexRange overload for 1D par loops
- [[PR 1024]](https://github.com/parthenon-hpc-lab/parthenon/pull/1024) Add .outN. to history output filenames
- [[PR 1004]](https://github.com/parthenon-hpc-lab/parthenon/pull/1004) Allow parameter modification from an input file for restarts

### Fixed (not changing behavior/API/variables/...)
- [[PR 1145]](https://github.com/parthenon-hpc-lab/parthenon/pull/1145) Fix remaining swarm D->H->D copies
- [[PR 1150]](https://github.com/parthenon-hpc-lab/parthenon/pull/1150) Reduce memory consumption for buffer pool
- [[PR 1146]](https://github.com/parthenon-hpc-lab/parthenon/pull/1146) Fix an issue outputting >4GB single variables per rank
- [[PR 1152]](https://github.com/parthenon-hpc-lab/parthenon/pull/1152) Fix memory leak in task graph outputs related to `abi::__cxa_demangle`
- [[PR 1146]](https://github.com/parthenon-hpc-lab/parthenon/pull/1146) Fix an issue outputting >4GB single variables per rank
- [[PR 1144]](https://github.com/parthenon-hpc-lab/parthenon/pull/1144) Fix some restarts w/non-CC fields
- [[PR 1132]](https://github.com/parthenon-hpc-lab/parthenon/pull/1132) Fix regional dependencies for iterative task lists and make solvers work for arbirtrary MeshData partitioning
- [[PR 1139]](https://github.com/parthenon-hpc-lab/parthenon/pull/1139) only add --expt-relaxed-constexpr for COMPILE_LANGUAGE:CXX
- [[PR 1131]](https://github.com/parthenon-hpc-lab/parthenon/pull/1131) Make deallocation of fine and sparse fields work
- [[PR 1127]](https://github.com/parthenon-hpc-lab/parthenon/pull/1127) Add WithFluxes to IsRefined check
- [[PR 1111]](https://github.com/parthenon-hpc-lab/parthenon/pull/1111) Fix undefined behavior due to bitshift of negative number in LogicalLocation
- [[PR 1092]](https://github.com/parthenon-hpc-lab/parthenon/pull/1092) Updates to DataCollection and MeshData to remove requirement of predefining MeshBlockData
Expand All @@ -56,6 +105,7 @@
- [[PR 1031]](https://github.com/parthenon-hpc-lab/parthenon/pull/1031) Fix bug in non-cell centered AMR

### Infrastructure (changes irrelevant to downstream codes)
- [[PR 1117]](https://github.com/parthenon-hpc-lab/parthenon/pull/1117) Enable CI pipelines on AMD GPUs with ROCM/HIP
- [[PR 1114]](https://github.com/parthenon-hpc-lab/parthenon/pull/1114) Enable sanitizers for extended CI host build
- [[PR 1123]](https://github.com/parthenon-hpc-lab/parthenon/pull/1123) Default initialize ProResInfo.dir
- [[PR 1121]](https://github.com/parthenon-hpc-lab/parthenon/pull/1121) Default initialize BndInfo.dir
Expand All @@ -73,6 +123,7 @@
- [[PR 1108]](https://github.com/parthenon-hpc-lab/parthenon/pull/1108) Remove NaN payload tags infrastructure

### Incompatibilities (i.e. breaking changes)
- [[PR 1135]](https://github.com/parthenon-hpc-lab/parthenon/pull/1135) Drivers now correctly return DriverStatus::timeout on hittig walltime limit
- [[PR 1128]](https://github.com/parthenon-hpc-lab/parthenon/pull/1128) Add cycle and nbtotal to hst
- [[PR 1108]](https://github.com/parthenon-hpc-lab/parthenon/pull/1108) Remove NaN payload tags infrastructure
- [[PR 1026]](https://github.com/parthenon-hpc-lab/parthenon/pull/1026) Particle BCs without relocatable device code
Expand Down
26 changes: 20 additions & 6 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
# Copyright(C) 2020-2024 The Parthenon collaboration
# Licensed under the 3-clause BSD License, see LICENSE file for details
#=========================================================================================
# (C) (or copyright) 2020-2023. Triad National Security, LLC. All rights reserved.
# (C) (or copyright) 2020-2024. Triad National Security, LLC. All rights reserved.
#
# This program was produced under U.S. Government contract 89233218CNA000001 for Los
# Alamos National Laboratory (LANL), which is operated by Triad National Security, LLC
Expand All @@ -20,7 +20,7 @@ cmake_minimum_required(VERSION 3.16)
# Imports machine-specific configuration
include(cmake/MachineCfg.cmake)

project(parthenon VERSION 24.03 LANGUAGES C CXX)
project(parthenon VERSION 24.08 LANGUAGES C CXX)

if (${CMAKE_VERSION} VERSION_GREATER_EQUAL 3.19.0)
cmake_policy(SET CMP0110 NEW)
Expand Down Expand Up @@ -60,13 +60,24 @@ option(CODE_COVERAGE "Enable code coverage reporting" OFF)
option(ENABLE_ASAN "Turn on ASAN" OFF)
option(ENABLE_HWASAN "Turn on HWASAN (currently ARM-only)" OFF)

option(PARTHENON_USE_SYSTEM_PACKAGES "Enables search for system packages when available" OFF)
if (PARTHENON_USE_SYSTEM_PACKAGES)
option(PARTHENON_IMPORT_KOKKOS
"If ON, attempt to link to an external Kokkos library. If OFF, build Kokkos from source and package with Parthenon"
ON)
else()
option(PARTHENON_IMPORT_KOKKOS
"If ON, attempt to link to an external Kokkos library. If OFF, build Kokkos from source and package with Parthenon"
OFF)
endif()

include(cmake/Format.cmake)
include(cmake/Lint.cmake)

# regression test reference data
set(REGRESSION_GOLD_STANDARD_VER 23 CACHE STRING "Version of gold standard to download and use")
set(REGRESSION_GOLD_STANDARD_VER 24 CACHE STRING "Version of gold standard to download and use")
set(REGRESSION_GOLD_STANDARD_HASH
"SHA512=bb070f78ae0ecd65bd662f670eee60b4414804770b5041867652d9b5a8e411c59612457499a532068b2584acaa6d120ceb0db96bfde196a9cd129a6246b76fb3"
"SHA512=e220df92a335131131e42ddb52dc221a6dbd6bb56361483b4af0292620eeb82ffb21ef3b95fd9a7c5cc158fb754da0bf1a1015bec98b5bbad05f4bceb1ee99bc"
CACHE STRING "Hash of default gold standard file to download")
option(REGRESSION_GOLD_STANDARD_SYNC "Automatically sync gold standard files." ON)

Expand Down Expand Up @@ -204,7 +215,6 @@ endif()
set(CMAKE_CXX_STANDARD_REQUIRED ON)
set(CMAKE_CXX_STANDARD 17)

option(PARTHENON_IMPORT_KOKKOS "If ON, attempt to link to an external Kokkos library. If OFF, build Kokkos from source and package with Parthenon" OFF)
if (NOT TARGET Kokkos::kokkos)
if (PARTHENON_IMPORT_KOKKOS)
find_package(Kokkos 4)
Expand Down Expand Up @@ -367,7 +377,11 @@ if (PARTHENON_ENABLE_UNIT_TESTS OR PARTHENON_ENABLE_INTEGRATION_TESTS OR PARTHEN
endif()

if (PARTHENON_ENABLE_ASCENT)
find_package(Ascent REQUIRED NO_DEFAULT_PATH)
if (PARTHENON_USE_SYSTEM_PACKAGES)
find_package(Ascent REQUIRED)
else()
find_package(Ascent REQUIRED NO_DEFAULT_PATH)
endif()
endif()

# Installation configuration
Expand Down
Binary file added MG_grid_hierarchy.pdf
Binary file not shown.
2 changes: 1 addition & 1 deletion cmake/TestSetup.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -152,7 +152,7 @@ function(setup_test_parallel nproc dir arg extra_labels)
list(APPEND labels "${extra_labels}")

if(Kokkos_ENABLE_CUDA OR Kokkos_ENABLE_HIP)
set(PARTHENON_KOKKOS_TEST_ARGS "--kokkos-num-devices=${NUM_GPU_DEVICES_PER_NODE}")
set(PARTHENON_KOKKOS_TEST_ARGS "--kokkos-map-device-id-by=mpi_rank")
list(APPEND labels "cuda")
endif()
if (Kokkos_ENABLE_OPENMP)
Expand Down
5 changes: 3 additions & 2 deletions cmake/machinecfg/GitHubActions.cmake
Original file line number Diff line number Diff line change
Expand Up @@ -29,9 +29,10 @@ if (${MACHINE_VARIANT} MATCHES "cuda")
set(MACHINE_CXX_FLAGS "${MACHINE_CXX_FLAGS} -Wno-unknown-cuda-version")
endif()
elseif (${MACHINE_VARIANT} MATCHES "hip")
# using an arbitrary arch as GitHub Action runners don't have GPUs
set(Kokkos_ARCH_VEGA908 ON CACHE BOOL "GPU architecture")
# using an arch that matches Hamilton at Hamburg Obs
set(Kokkos_ARCH_NAVI1030 ON CACHE BOOL "GPU architecture")
set(Kokkos_ENABLE_HIP ON CACHE BOOL "Enable HIP")
set(Kokkos_ENABLE_ZEN3 ON CACHE BOOL "Enable Zen3")
else()
set(MACHINE_CXX_FLAGS "${MACHINE_CXX_FLAGS} -fopenmp-simd")
endif()
Expand Down
Binary file added convergence.pdf
Binary file not shown.
Binary file added doc/latex/MG_grid_hierarchy.pdf
Binary file not shown.
Binary file added doc/latex/convergence.pdf
Binary file not shown.
37 changes: 37 additions & 0 deletions doc/latex/coordinate_transform.tex
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
\begin{tikzpicture}
\begin{axis}[grid=both,
ymin=0,
ymax=4.5,
xmax=4.5,
xmin=0,
xticklabel=\empty,
yticklabel=\empty,
minor tick num=1,
axis lines = middle,
xlabel=$x_1$,
ylabel=$x_2$,
label style = {at={(ticklabel cs:1.1)}},
axis equal=true, width=6cm, height=6cm]

\coordinate (t1ll) at (1, 1);
\pic at (t1ll) {ig_tree={t1, $\Omega_1$}};
\pic at (t1ll) {ig_tree_region={{0.8, 0.8}, {1.0, 1.0}, green}};

\coordinate (t2ll) at (3, 1);
\pic at (t2ll) {ig_tree={t2, $\Omega_2$}};
\pic at (t2ll) {ig_tree_region={{-0.2, 0.8}, {0.0, 1.0}, green}};

\coordinate (t3ll) at (3, 3);
\pic at (t3ll) {ig_tree={t3, $\Omega_3$}};
\pic at (t3ll) {ig_tree_region={{-0.2, -0.2}, {0.0, 0.0}, green}};

\path[thick, ->] ([shift={(0.9, 0.85)}]t1ll) edge[bend right] node [below]
{$\tau_{1 \rightarrow 2}$} ([shift={(-0.15, 0.85)}]t2ll);

\path[thick, ->] ([shift={(-0.05, 0.95)}]t2ll) edge[bend right] node [right]
{$\tau_{2 \rightarrow 3}$} ([shift={(-0.05, -0.15)}]t3ll);

\path[thick, ->] ([shift={(0.9, 0.95)}]t1ll) edge node [left]
{$\tau_{1 \rightarrow 3}$} ([shift={(-0.15, -0.05)}]t3ll);
\end{axis}
\end{tikzpicture}
28 changes: 28 additions & 0 deletions doc/latex/indexed_cube.tex
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
\begin{tikzpicture}
\newcommand{\Depth}{2}
\newcommand{\Height}{2}
\newcommand{\Width}{2}
\coordinate (O) at (0,0,0);
\coordinate (A) at (0,\Width,0);
\coordinate (B) at (0,\Width,\Height);
\coordinate (C) at (0,0,\Height);
\coordinate (D) at (\Depth,0,0);
\coordinate (E) at (\Depth,\Width,0);
\coordinate (F) at (\Depth,\Width,\Height);
\coordinate (G) at (\Depth,0,\Height);

\draw[black] (O) node [left]{2} -- (C) node [left]{0} -- (G) node [right]{1} -- (D) node [right]{3} -- cycle;% Bottom Face
\draw[black] (O) -- (A) -- (E) -- (D) -- cycle;% Back Face
\draw[black] (O) -- (A) -- (B) -- (C) -- cycle;% Left Face
\draw[black] (D) -- (E) -- (F) -- (G) -- cycle;% Right Face
\draw[black] (C) -- (B) -- (F) -- (G) -- cycle;% Front Face
\draw[black] (A) node [left]{6} -- (B) node [left]{4} -- (F) node [right]{5}-- (E) node [right]{7} -- cycle;% Top Face

\path[->] ([shift={(0,0,0)}]C) edge node [below] {$x_1$} ([shift={(-1.0, 0, 0)}]G);
\path[->] ([shift={(0,0,0)}]C) edge node [right] {$x_2$} ([shift={(0, 0, 1.0)}]O);
\path[->] ([shift={(0,0,0)}]C) edge node [left] {$x_3$} ([shift={(0, -1.0, 0)}]B);
%\draw[blue, fill=black] (1.0, 2.0, 1.0) circle (0.05) node[above] {$(0, 0, 1)$};
%\draw[blue, fill=black] (1.0, 0.0, 1.0) circle (0.05) node[below] {$(0, 0, -1)$};
%\draw[blue, fill=black] (2.0, 1.0, 1.0) circle (0.05) node[right] {$(1, 0, 0)$};
%\draw[blue, fill=black] (0.0, 1.0, 1.0) circle (0.05) node[left] {$(-1, 0, 0)$};
\end{tikzpicture}
Binary file added doc/latex/main.pdf
Binary file not shown.
Loading

0 comments on commit 8181be8

Please sign in to comment.