Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

start of some docs describing the GPU flow #2843

Merged
merged 5 commits into from
Jun 18, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
58 changes: 58 additions & 0 deletions Docs/source/gpu.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,58 @@
*********************
GPU Programming Model
*********************

CPUs and GPUs have separate memory, which means that working on both
the host and device may involve managing the transfer of data between
the memory on the host and that on the GPU.

In Castro, the core design when running on GPUs is that all of the compute
should be done on the GPU.

When we compile with ``USE_CUDA=TRUE`` or ``USE_HIP=TRUE``, AMReX will allocate
a pool of memory on the GPUs and all of the ``StateData`` will be stored there.
As long as we then do all of the computation on the GPUs, then we don't need
to manage any of the data movement manually.

.. note::

We can tell AMReX to allocate the data using managed-memory by
setting:

::

amrex.the_arena_is_managed = 1

This is generally not needed.

The programming model used throughout Castro is C++-lambda-capturing
by value. We access the ``FArrayBox`` stored in the ``StateData``
``MultiFab`` by creating an ``Array4`` object. The ``Array4`` does
not directly store a copy of the data, but instead has a pointer to
the data in the ``FArrayBox``. When we capture the ``Array4`` by
value in the GPU kernel, the GPU gets access to the pointer to the
underlying data.


Most AMReX functions will work on the data directly on the GPU (like
``.setVal()``).

In rare instances where we might need to operate on the data on the
host, we can force a copy to the host, do the work, and then copy
back. For an example, see the reduction done in ``Gravity.cpp``.

.. note::

For a thorough discussion of how the AMReX GPU offloading works
see :cite:`amrex-ecp`.


Runtime parameters
------------------

The main exception for all data being on the GPUs all the time are the
runtime parameters. At the moment, these are allocated as managed
memory and stored in global memory. This is simply to make it easier
to read them in and initialize them on the CPU at runtime.


1 change: 1 addition & 0 deletions Docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ https://github.com/amrex-astro/Castro
mpi_plus_x
FlowChart
software
gpu
problem_setups
timestepping
creating_a_problem
Expand Down
19 changes: 19 additions & 0 deletions Docs/source/refs.bib
Original file line number Diff line number Diff line change
Expand Up @@ -1131,3 +1131,22 @@ @ARTICLE{doubledet2024
title = {Sensitivity of Simulations of Double-detonation Type Ia Supernovae to Integration Methodology},
journal = {The Astrophysical Journal},
}


@ARTICLE{amrex-ecp,
author = {{Myers}, Andrew and {Zhang}, Weiqun and {Almgren}, Ann and {Antoun}, Thierry and {Bell}, John and {Huebl}, Axel and {Sinn}, Alexander},
title = "{AMReX and pyAMReX: Looking Beyond ECP}",
journal = {arXiv e-prints},
keywords = {Computer Science - Distributed, Parallel, and Cluster Computing},
year = 2024,
month = mar,
eid = {arXiv:2403.12179},
pages = {arXiv:2403.12179},
doi = {10.48550/arXiv.2403.12179},
archivePrefix = {arXiv},
eprint = {2403.12179},
primaryClass = {cs.DC},
adsurl = {https://ui.adsabs.harvard.edu/abs/2024arXiv240312179M},
adsnote = {Provided by the SAO/NASA Astrophysics Data System}
}