From bd45a7f70749506a9fd5dc0cc5cc64d72628aa74 Mon Sep 17 00:00:00 2001 From: Istvan Kiss Date: Thu, 27 Jun 2024 19:40:17 +0200 Subject: [PATCH] External review feedback and minor styling --- .wordlist.txt | 1 + docs/how-to/unified_memory.rst | 106 ++++++++------------------------- 2 files changed, 25 insertions(+), 82 deletions(-) diff --git a/.wordlist.txt b/.wordlist.txt index dfdefbaf57..5935ee807b 100644 --- a/.wordlist.txt +++ b/.wordlist.txt @@ -72,6 +72,7 @@ multithreading NCCL NDRange nonnegative +NOP Numa Nsight overindex diff --git a/docs/how-to/unified_memory.rst b/docs/how-to/unified_memory.rst index 500882d964..b24cd4c82f 100644 --- a/docs/how-to/unified_memory.rst +++ b/docs/how-to/unified_memory.rst @@ -33,7 +33,7 @@ throughput (data processed by unit time). .. _unified memory system requirements: -System Requirements +System requirements =================== Unified memory is supported on Linux by all modern AMD GPUs from the Vega series onward. Unified memory management can be achieved with managed memory @@ -78,26 +78,26 @@ page-fault. For more details, visit .. _unified memory programming models: -Unified Memory Programming Models +Unified memory programming models ================================= Showcasing various unified memory programming models, the model availability depends on your architecture. For more information, see :ref:`unified memory system requirements` and :ref:`checking unified memory management support`. -- **HIP Managed Memory Allocation API**: +- **HIP managed memory allocation API**: The ``hipMallocManaged()`` is a dynamic memory allocator available on all GPUs with unified memory support. For more details, visit :ref:`unified_memory_reference`. -- **HIP Managed Variables**: +- **HIP managed variables**: The ``__managed__`` declaration specifier, which serves as its counterpart, is supported on all modern AMD cards and can be utilized for static allocation. -- **System Allocation API**: +- **System allocation API**: Starting with the AMD MI300 series, the ``malloc()`` system allocator allows you to reserve unified memory. The system allocator is more versatile and @@ -106,7 +106,7 @@ system requirements` and :ref:`checking unified memory management support`. .. _checking unified memory management support: -Checking Unified Memory Management Support +Checking unified memory management support ------------------------------------------ Some device attributes can offer information about which :ref:`unified memory programming models` are supported. The attribute value is 1 if the @@ -145,7 +145,7 @@ The following examples show how to use device attributes: return 0; } -Example for Unified Memory Management +Example for unified memory management ------------------------------------- The following example shows how to use unified memory management with @@ -324,9 +324,10 @@ Memory Management example is presented in the last tab. .. _using unified memory management: -Using Unified Memory Management (UMM) +Using unified memory management (UMM) ===================================== -Unified Memory Management (UMM) is a feature that can simplify the complexities + +Unified memory management (UMM) is a feature that can simplify the complexities of memory management in GPU computing. It is particularly useful in heterogeneous computing environments with heavy memory usage with both a CPU and a GPU, which would require large memory transfers. Here are some areas @@ -359,12 +360,12 @@ potential performance overhead associated with UMM. You must thoroughly test and profile your code to ensure it's the most suitable choice for your use case. -.. _unified memory compiler hints: +.. _unified memory runtime hints: -Unified Memory Compiler Hints for the Better Performance -======================================================== +Unified memory HIP runtime hints for the better performance +=========================================================== -Unified memory compiler hints can help improve the performance of your code if +Unified memory HIP runtime hints can help improve the performance of your code if you know your code's ability and infrastructure. Some hint techniques are presented in this section. @@ -376,10 +377,11 @@ special ``device_id`` values: - ``hipInvalidDeviceId`` = -2 means that the device is invalid. For the best performance, profile your application to optimize the -utilization of compiler hits. +utilization of HIP runtime hints. -Data Prefetching +Data prefetching ---------------- + Data prefetching is a technique used to improve the performance of your application by moving data closer to the processing unit before it's actually needed. @@ -437,8 +439,9 @@ needed. Remember to check the return status of ``hipMemPrefetchAsync()`` to ensure that the prefetch operations are completed successfully. -Memory Advice +Memory advice ------------- + The effectiveness of ``hipMemAdvise()`` comes from its ability to inform the runtime system of the developer's intentions regarding memory usage. When the runtime system has knowledge of the expected memory access patterns, it can @@ -504,8 +507,9 @@ Here is the updated version of the example above with memory advice. } -Memory Range attributes +Memory range attributes ----------------------- + Memory Range attributes allow you to query attributes of a given memory range. The ``hipMemRangeGetAttribute()`` is added to the example to query the @@ -565,71 +569,9 @@ For more details, visit the return 0; } -Asynchronously Attach Memory to a Stream +Asynchronously attach memory to a stream ---------------------------------------- -The ``hipStreamAttachMemAsync`` function asynchronously attaches memory to a -stream, which can help concurrent execution when using streams. - -In the example, a stream is created by using ``hipStreamCreate()`` and then -the managed memory is attached to the stream using -``hipStreamAttachMemAsync()``. The ``hipMemAttachGlobal`` flag is used to -indicate that the memory can be accessed from any stream on any device. -The kernel launch and synchronization are now done on this stream. -Using streams and attaching memory to them can help with overlapping data -transfers and computation. - -For more details and description of flags, visit -:ref:`unified_memory_reference`. - -.. code-block:: cpp - :emphasize-lines: 21-24 - - #include - #include - - // Addition of two values. - __global__ void add(int *a, int *b, int *c) { - *c = *a + *b; - } - - int main() { - int *a, *b, *c; - hipStream_t stream; - - // Create a stream. - hipStreamCreate(&stream); - - // Allocate memory for a, b and c that is accessible to both device and host codes. - hipMallocManaged(&a, sizeof(*a)); - hipMallocManaged(&b, sizeof(*b)); - hipMallocManaged(&c, sizeof(*c)); - - // Attach memory to the stream asynchronously. - hipStreamAttachMemAsync(stream, a, sizeof(*a), hipMemAttachGlobal); - hipStreamAttachMemAsync(stream, b, sizeof(*b), hipMemAttachGlobal); - hipStreamAttachMemAsync(stream, c, sizeof(*c), hipMemAttachGlobal); - - // Setup input values. - *a = 1; - *b = 2; - - // Launch add() kernel on GPU on the created stream. - hipLaunchKernelGGL(add, dim3(1), dim3(1), 0, stream, a, b, c); - - // Wait for stream to finish before accessing on host. - hipStreamSynchronize(stream); +The ``hipStreamAttachMemAsync`` function would be able to asynchronously attach memory to a stream, which can help concurrent execution when using streams. - // Prints the result. - std::cout << *a << " + " << *b << " = " << *c << std::endl; - - // Cleanup allocated memory. - hipFree(a); - hipFree(b); - hipFree(c); - - // Destroy the stream. - hipStreamDestroy(stream); - - return 0; - } +Currently, this function is a no-operation (NOP) function on AMD GPUs. It simply returns success after the runtime memory validation passed. This function is necessary on Microsoft Windows, and UMM is not supported on this operating system with AMD GPUs at the moment.