Skip to content

Commit

Permalink
External review feedback and minor styling
Browse files Browse the repository at this point in the history
  • Loading branch information
neon60 committed Jun 27, 2024
1 parent ad04193 commit bd45a7f
Show file tree
Hide file tree
Showing 2 changed files with 25 additions and 82 deletions.
1 change: 1 addition & 0 deletions .wordlist.txt
Original file line number Diff line number Diff line change
Expand Up @@ -72,6 +72,7 @@ multithreading
NCCL
NDRange
nonnegative
NOP
Numa
Nsight
overindex
Expand Down
106 changes: 24 additions & 82 deletions docs/how-to/unified_memory.rst
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ throughput (data processed by unit time).

.. _unified memory system requirements:

System Requirements
System requirements
===================
Unified memory is supported on Linux by all modern AMD GPUs from the Vega
series onward. Unified memory management can be achieved with managed memory
Expand Down Expand Up @@ -78,26 +78,26 @@ page-fault. For more details, visit

.. _unified memory programming models:

Unified Memory Programming Models
Unified memory programming models
=================================

Showcasing various unified memory programming models, the model availability
depends on your architecture. For more information, see :ref:`unified memory
system requirements` and :ref:`checking unified memory management support`.

- **HIP Managed Memory Allocation API**:
- **HIP managed memory allocation API**:

The ``hipMallocManaged()`` is a dynamic memory allocator available on
all GPUs with unified memory support. For more details, visit
:ref:`unified_memory_reference`.

- **HIP Managed Variables**:
- **HIP managed variables**:

The ``__managed__`` declaration specifier, which serves as its counterpart,
is supported on all modern AMD cards and can be utilized for static
allocation.

- **System Allocation API**:
- **System allocation API**:

Starting with the AMD MI300 series, the ``malloc()`` system allocator allows
you to reserve unified memory. The system allocator is more versatile and
Expand All @@ -106,7 +106,7 @@ system requirements` and :ref:`checking unified memory management support`.

.. _checking unified memory management support:

Checking Unified Memory Management Support
Checking unified memory management support
------------------------------------------
Some device attributes can offer information about which :ref:`unified memory
programming models` are supported. The attribute value is 1 if the
Expand Down Expand Up @@ -145,7 +145,7 @@ The following examples show how to use device attributes:
return 0;
}
Example for Unified Memory Management
Example for unified memory management
-------------------------------------

The following example shows how to use unified memory management with
Expand Down Expand Up @@ -324,9 +324,10 @@ Memory Management example is presented in the last tab.
.. _using unified memory management:

Using Unified Memory Management (UMM)
Using unified memory management (UMM)
=====================================
Unified Memory Management (UMM) is a feature that can simplify the complexities

Unified memory management (UMM) is a feature that can simplify the complexities
of memory management in GPU computing. It is particularly useful in
heterogeneous computing environments with heavy memory usage with both a CPU
and a GPU, which would require large memory transfers. Here are some areas
Expand Down Expand Up @@ -359,12 +360,12 @@ potential performance overhead associated with UMM. You must thoroughly test
and profile your code to ensure it's the most suitable choice for your use
case.

.. _unified memory compiler hints:
.. _unified memory runtime hints:

Unified Memory Compiler Hints for the Better Performance
========================================================
Unified memory HIP runtime hints for the better performance
===========================================================

Unified memory compiler hints can help improve the performance of your code if
Unified memory HIP runtime hints can help improve the performance of your code if
you know your code's ability and infrastructure. Some hint techniques are
presented in this section.

Expand All @@ -376,10 +377,11 @@ special ``device_id`` values:
- ``hipInvalidDeviceId`` = -2 means that the device is invalid.

For the best performance, profile your application to optimize the
utilization of compiler hits.
utilization of HIP runtime hints.

Data Prefetching
Data prefetching
----------------

Data prefetching is a technique used to improve the performance of your
application by moving data closer to the processing unit before it's actually
needed.
Expand Down Expand Up @@ -437,8 +439,9 @@ needed.
Remember to check the return status of ``hipMemPrefetchAsync()`` to ensure that
the prefetch operations are completed successfully.

Memory Advice
Memory advice
-------------

The effectiveness of ``hipMemAdvise()`` comes from its ability to inform the
runtime system of the developer's intentions regarding memory usage. When the
runtime system has knowledge of the expected memory access patterns, it can
Expand Down Expand Up @@ -504,8 +507,9 @@ Here is the updated version of the example above with memory advice.
}
Memory Range attributes
Memory range attributes
-----------------------

Memory Range attributes allow you to query attributes of a given memory range.

The ``hipMemRangeGetAttribute()`` is added to the example to query the
Expand Down Expand Up @@ -565,71 +569,9 @@ For more details, visit the
return 0;
}
Asynchronously Attach Memory to a Stream
Asynchronously attach memory to a stream
----------------------------------------

The ``hipStreamAttachMemAsync`` function asynchronously attaches memory to a
stream, which can help concurrent execution when using streams.

In the example, a stream is created by using ``hipStreamCreate()`` and then
the managed memory is attached to the stream using
``hipStreamAttachMemAsync()``. The ``hipMemAttachGlobal`` flag is used to
indicate that the memory can be accessed from any stream on any device.
The kernel launch and synchronization are now done on this stream.
Using streams and attaching memory to them can help with overlapping data
transfers and computation.

For more details and description of flags, visit
:ref:`unified_memory_reference`.

.. code-block:: cpp
:emphasize-lines: 21-24
#include <hip/hip_runtime.h>
#include <iostream>
// Addition of two values.
__global__ void add(int *a, int *b, int *c) {
*c = *a + *b;
}
int main() {
int *a, *b, *c;
hipStream_t stream;
// Create a stream.
hipStreamCreate(&stream);
// Allocate memory for a, b and c that is accessible to both device and host codes.
hipMallocManaged(&a, sizeof(*a));
hipMallocManaged(&b, sizeof(*b));
hipMallocManaged(&c, sizeof(*c));
// Attach memory to the stream asynchronously.
hipStreamAttachMemAsync(stream, a, sizeof(*a), hipMemAttachGlobal);
hipStreamAttachMemAsync(stream, b, sizeof(*b), hipMemAttachGlobal);
hipStreamAttachMemAsync(stream, c, sizeof(*c), hipMemAttachGlobal);
// Setup input values.
*a = 1;
*b = 2;
// Launch add() kernel on GPU on the created stream.
hipLaunchKernelGGL(add, dim3(1), dim3(1), 0, stream, a, b, c);
// Wait for stream to finish before accessing on host.
hipStreamSynchronize(stream);
The ``hipStreamAttachMemAsync`` function would be able to asynchronously attach memory to a stream, which can help concurrent execution when using streams.

// Prints the result.
std::cout << *a << " + " << *b << " = " << *c << std::endl;
// Cleanup allocated memory.
hipFree(a);
hipFree(b);
hipFree(c);
// Destroy the stream.
hipStreamDestroy(stream);
return 0;
}
Currently, this function is a no-operation (NOP) function on AMD GPUs. It simply returns success after the runtime memory validation passed. This function is necessary on Microsoft Windows, and UMM is not supported on this operating system with AMD GPUs at the moment.

0 comments on commit bd45a7f

Please sign in to comment.