AMReX-Astro · zingale · Jun 18, 2024 · May 3, 2024 · May 7, 2024 · May 7, 2024
diff --git a/Docs/source/gpu.rst b/Docs/source/gpu.rst
@@ -0,0 +1,58 @@
+*********************
+GPU Programming Model
+*********************
+
+CPUs and GPUs have separate memory, which means that working on both
+the host and device may involve managing the transfer of data between
+the memory on the host and that on the GPU.
+
+In Castro, the core design when running on GPUs is that all of the compute
+should be done on the GPU.
+
+When we compile with ``USE_CUDA=TRUE`` or ``USE_HIP=TRUE``, AMReX will allocate
+a pool of memory on the GPUs and all of the ``StateData`` will be stored there.
+As long as we then do all of the computation on the GPUs, then we don't need
+to manage any of the data movement manually.
+
+.. note::
+
+ We can tell AMReX to allocate the data using managed-memory by
+ setting:
+
+ ::
+
+ amrex.the_arena_is_managed = 1
+
+ This is generally not needed.
+
+The programming model used throughout Castro is C++-lambda-capturing
+by value. We access the ``FArrayBox`` stored in the ``StateData``
+``MultiFab`` by creating an ``Array4`` object. The ``Array4`` does
+not directly store a copy of the data, but instead has a pointer to
+the data in the ``FArrayBox``. When we capture the ``Array4`` by
+value in the GPU kernel, the GPU gets access to the pointer to the
+underlying data.
+
+
+Most AMReX functions will work on the data directly on the GPU (like
+``.setVal()``).
+
+In rare instances where we might need to operate on the data on the
+host, we can force a copy to the host, do the work, and then copy
+back. For an example, see the reduction done in ``Gravity.cpp``.
+
+.. note::
+
+ For a thorough discussion of how the AMReX GPU offloading works
+ see :cite:`amrex-ecp`.
+
+
+Runtime parameters
+------------------
+
+The main exception for all data being on the GPUs all the time are the
+runtime parameters. At the moment, these are allocated as managed
+memory and stored in global memory. This is simply to make it easier
+to read them in and initialize them on the CPU at runtime.
+
+
diff --git a/Docs/source/index.rst b/Docs/source/index.rst
@@ -21,6 +21,7 @@ https://github.com/amrex-astro/Castro
  mpi_plus_x
  FlowChart
  software
+ gpu
  problem_setups
  timestepping
  creating_a_problem

diff --git a/Docs/source/refs.bib b/Docs/source/refs.bib
@@ -1131,3 +1131,22 @@ @ARTICLE{doubledet2024
 title = {Sensitivity of Simulations of Double-detonation Type Ia Supernovae to Integration Methodology},
 journal = {The Astrophysical Journal},
 }
+
+
+@ARTICLE{amrex-ecp,
+ author = {{Myers}, Andrew and {Zhang}, Weiqun and {Almgren}, Ann and {Antoun}, Thierry and {Bell}, John and {Huebl}, Axel and {Sinn}, Alexander},
+ title = "{AMReX and pyAMReX: Looking Beyond ECP}",
+ journal = {arXiv e-prints},
+ keywords = {Computer Science - Distributed, Parallel, and Cluster Computing},
+ year = 2024,
+ month = mar,
+ eid = {arXiv:2403.12179},
+ pages = {arXiv:2403.12179},
+ doi = {10.48550/arXiv.2403.12179},
+archivePrefix = {arXiv},
+ eprint = {2403.12179},
+ primaryClass = {cs.DC},
+ adsurl = {https://ui.adsabs.harvard.edu/abs/2024arXiv240312179M},
+ adsnote = {Provided by the SAO/NASA Astrophysics Data System}
+}
+