Skip to content

Commit

Permalink
Update based on review comments
Browse files Browse the repository at this point in the history
  • Loading branch information
matyas-streamhpc committed Jun 27, 2024
1 parent 4d20933 commit ad04193
Showing 1 changed file with 27 additions and 30 deletions.
57 changes: 27 additions & 30 deletions docs/how-to/unified_memory.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,8 +7,6 @@
Unified Memory
*******************************************************************************

Introduction
============
In conventional architectures, CPUs and GPUs have dedicated memory like Random
Access Memory (RAM) and Video Random Access Memory (VRAM). This architectural
design, while effective, can be limiting in terms of memory capacity and
Expand Down Expand Up @@ -83,13 +81,13 @@ page-fault. For more details, visit
Unified Memory Programming Models
=================================

Showcasing various unified memory programming models, their availability
depends on your architecture. For further details, visit :ref:`unified memory
Showcasing various unified memory programming models, the model availability
depends on your architecture. For more information, see :ref:`unified memory
system requirements` and :ref:`checking unified memory management support`.

- **HIP Managed Memory Allocation API**:

The ``hipMallocManaged()`` is a dynamic memory allocator that is available on
The ``hipMallocManaged()`` is a dynamic memory allocator available on
all GPUs with unified memory support. For more details, visit
:ref:`unified_memory_reference`.

Expand All @@ -101,17 +99,17 @@ system requirements` and :ref:`checking unified memory management support`.

- **System Allocation API**:

Starting with the MI300 series, the ``malloc()`` system allocator allows you
to reserve unified memory. The system allocator is more versatile, and it
Starting with the AMD MI300 series, the ``malloc()`` system allocator allows
you to reserve unified memory. The system allocator is more versatile and
offers an easy transition from a CPU written C++ code to a HIP code as the
same system allocation API is used.

.. _checking unified memory management support:

Checking Unified Memory Management Support
------------------------------------------
Some device attribute can offer information about which :ref:`unified memory
programming models` are supported. The attribute value is an integer 1 if the
Some device attributes can offer information about which :ref:`unified memory
programming models` are supported. The attribute value is 1 if the
functionality is supported, and 0 if it is not supported.

.. list-table:: Device attributes for unified memory management
Expand Down Expand Up @@ -340,35 +338,35 @@ where UMM can be beneficial:
it easier for developers to write code without worrying about memory
allocation and deallocation details.

- **Data Migration**:
- **Data Migration**:

UMM allows for efficient data migration between the host (CPU) and the device
(GPU). This can be particularly useful for applications that need to move
data back and forth between the device and host.

- **Improved Programming Productivity**:

As a positive side effect, the use of UMM can reduce the lines of code,
thereby improving programming productivity.
As a positive side effect, UMM can reduce the lines of code, thereby
improving programming productivity.

In HIP, pinned memory allocations are coherent by default. Pinned memory is
host memory mapped into the address space of all GPUs, meaning that the pointer
can be used on both host and device. Using pinned memory instead of pageable
memory on the host can improve bandwidth.

While UMM can provide numerous benefits, it is important to be aware of the
While UMM can provide numerous benefits, it's important to be aware of the
potential performance overhead associated with UMM. You must thoroughly test
and profile your code to ensure it is the most suitable choice for your use
and profile your code to ensure it's the most suitable choice for your use
case.

.. _unified memory compiler hints:

Unified Memory Compiler Hints for the Better Performance
========================================================

Unified memory compiler hints can help to improve the performance of your code,
if you know the ability of your code and the infrastructure that you use. Some
hint techniques are presented in this section.
Unified memory compiler hints can help improve the performance of your code if
you know your code's ability and infrastructure. Some hint techniques are
presented in this section.

The hint functions can set actions on a selected device, which can be
identified by ``hipGetDeviceProperties(&prop, device_id)``. There are two
Expand All @@ -377,13 +375,13 @@ special ``device_id`` values:
- ``hipCpuDeviceId`` = -1 means that the advised device is the CPU.
- ``hipInvalidDeviceId`` = -2 means that the device is invalid.

For the best performance you can profile your application to optimize the
For the best performance, profile your application to optimize the
utilization of compiler hits.

Data Prefetching
----------------
Data prefetching is a technique used to improve the performance of your
application by moving data closer to the processing unit before it is actually
application by moving data closer to the processing unit before it's actually
needed.

.. code-block:: cpp
Expand Down Expand Up @@ -437,9 +435,9 @@ needed.
}
Remember to check the return status of ``hipMemPrefetchAsync()`` to ensure that
the prefetch operations complete successfully!
the prefetch operations are completed successfully.

Memory Advise
Memory Advice
-------------
The effectiveness of ``hipMemAdvise()`` comes from its ability to inform the
runtime system of the developer's intentions regarding memory usage. When the
Expand All @@ -449,10 +447,10 @@ efficient execution of the application. However, the actual impact on
performance can vary based on the specific use case and the hardware
architecture.

For the description of ``hipMemAdvise()`` and the detailed list of advises,
For the description of ``hipMemAdvise()`` and the detailed list of advice,
visit the :ref:`unified_memory_reference`.

Here is the updated version of the example above with memory advises.
Here is the updated version of the example above with memory advice.

.. code-block:: cpp
:emphasize-lines: 17-26
Expand All @@ -468,17 +466,17 @@ Here is the updated version of the example above with memory advises.
int main() {
int *a, *b, *c;
// Allocate memory for a, b and c that is accessible to both device and host codes.
// Allocate memory for a, b, and c accessible to both device and host codes.
hipMallocManaged(&a, sizeof(*a));
hipMallocManaged(&b, sizeof(*b));
hipMallocManaged(&c, sizeof(*c));
// Set memory advise for a, b, and c to be accessed by the CPU.
// Set memory advice for a, b, and c to be accessed by the CPU.
hipMemAdvise(a, sizeof(*a), hipMemAdviseSetPreferredLocation, hipCpuDeviceId);
hipMemAdvise(b, sizeof(*b), hipMemAdviseSetPreferredLocation, hipCpuDeviceId);
hipMemAdvise(c, sizeof(*c), hipMemAdviseSetPreferredLocation, hipCpuDeviceId);
// Additionally, set memory advise for a, b, and c to be read mostly from the device 0.
// Additionally, set memory advice for a, b, and c to be read mostly from the device 0.
constexpr int device = 0;
hipMemAdvise(a, sizeof(*a), hipMemAdviseSetReadMostly, device);
hipMemAdvise(b, sizeof(*b), hipMemAdviseSetReadMostly, device);
Expand Down Expand Up @@ -570,9 +568,8 @@ For more details, visit the
Asynchronously Attach Memory to a Stream
----------------------------------------

The ``hipStreamAttachMemAsync`` function is used to asynchronously attach
memory to a stream, which can help with concurrent execution when using
streams.
The ``hipStreamAttachMemAsync`` function asynchronously attaches memory to a
stream, which can help concurrent execution when using streams.

In the example, a stream is created by using ``hipStreamCreate()`` and then
the managed memory is attached to the stream using
Expand Down

0 comments on commit ad04193

Please sign in to comment.