diff --git a/Docs/source/gpu.rst b/Docs/source/gpu.rst new file mode 100644 index 0000000000..4b4f68f813 --- /dev/null +++ b/Docs/source/gpu.rst @@ -0,0 +1,58 @@ +********************* +GPU Programming Model +********************* + +CPUs and GPUs have separate memory, which means that working on both +the host and device may involve managing the transfer of data between +the memory on the host and that on the GPU. + +In Castro, the core design when running on GPUs is that all of the compute +should be done on the GPU. + +When we compile with ``USE_CUDA=TRUE`` or ``USE_HIP=TRUE``, AMReX will allocate +a pool of memory on the GPUs and all of the ``StateData`` will be stored there. +As long as we then do all of the computation on the GPUs, then we don't need +to manage any of the data movement manually. + +.. note:: + + We can tell AMReX to allocate the data using managed-memory by + setting: + + :: + + amrex.the_arena_is_managed = 1 + + This is generally not needed. + +The programming model used throughout Castro is C++-lambda-capturing +by value. We access the ``FArrayBox`` stored in the ``StateData`` +``MultiFab`` by creating an ``Array4`` object. The ``Array4`` does +not directly store a copy of the data, but instead has a pointer to +the data in the ``FArrayBox``. When we capture the ``Array4`` by +value in the GPU kernel, the GPU gets access to the pointer to the +underlying data. + + +Most AMReX functions will work on the data directly on the GPU (like +``.setVal()``). + +In rare instances where we might need to operate on the data on the +host, we can force a copy to the host, do the work, and then copy +back. For an example, see the reduction done in ``Gravity.cpp``. + +.. note:: + + For a thorough discussion of how the AMReX GPU offloading works + see :cite:`amrex-ecp`. + + +Runtime parameters +------------------ + +The main exception for all data being on the GPUs all the time are the +runtime parameters. At the moment, these are allocated as managed +memory and stored in global memory. This is simply to make it easier +to read them in and initialize them on the CPU at runtime. + + diff --git a/Docs/source/index.rst b/Docs/source/index.rst index 0500e0bd62..23db0d744a 100644 --- a/Docs/source/index.rst +++ b/Docs/source/index.rst @@ -21,6 +21,7 @@ https://github.com/amrex-astro/Castro mpi_plus_x FlowChart software + gpu problem_setups timestepping creating_a_problem diff --git a/Docs/source/refs.bib b/Docs/source/refs.bib index 737461e2eb..3f8fca239a 100644 --- a/Docs/source/refs.bib +++ b/Docs/source/refs.bib @@ -1131,3 +1131,22 @@ @ARTICLE{doubledet2024 title = {Sensitivity of Simulations of Double-detonation Type Ia Supernovae to Integration Methodology}, journal = {The Astrophysical Journal}, } + + +@ARTICLE{amrex-ecp, + author = {{Myers}, Andrew and {Zhang}, Weiqun and {Almgren}, Ann and {Antoun}, Thierry and {Bell}, John and {Huebl}, Axel and {Sinn}, Alexander}, + title = "{AMReX and pyAMReX: Looking Beyond ECP}", + journal = {arXiv e-prints}, + keywords = {Computer Science - Distributed, Parallel, and Cluster Computing}, + year = 2024, + month = mar, + eid = {arXiv:2403.12179}, + pages = {arXiv:2403.12179}, + doi = {10.48550/arXiv.2403.12179}, +archivePrefix = {arXiv}, + eprint = {2403.12179}, + primaryClass = {cs.DC}, + adsurl = {https://ui.adsabs.harvard.edu/abs/2024arXiv240312179M}, + adsnote = {Provided by the SAO/NASA Astrophysics Data System} +} +