External review feedback and minor styling

ROCm · Jun 27, 2024 · bd45a7f · bd45a7f
1 parent ad04193
commit bd45a7f
Show file tree

Hide file tree

Showing 2 changed files with 25 additions and 82 deletions.
diff --git a/.wordlist.txt b/.wordlist.txt
@@ -72,6 +72,7 @@ multithreading
 NCCL
 NDRange
 nonnegative
+NOP
 Numa
 Nsight
 overindex

diff --git a/docs/how-to/unified_memory.rst b/docs/how-to/unified_memory.rst
@@ -33,7 +33,7 @@ throughput (data processed by unit time).
 
 .. _unified memory system requirements:
 
-System Requirements
+System requirements
 ===================
 Unified memory is supported on Linux by all modern AMD GPUs from the Vega
 series onward. Unified memory management can be achieved with managed memory
@@ -78,26 +78,26 @@ page-fault. For more details, visit
 
 .. _unified memory programming models:
 
-Unified Memory Programming Models
+Unified memory programming models
 =================================
 
 Showcasing various unified memory programming models, the model availability
 depends on your architecture. For more information, see :ref:`unified memory
 system requirements` and :ref:`checking unified memory management support`.
 
-- **HIP Managed Memory Allocation API**:
+- **HIP managed memory allocation API**:
 
   The ``hipMallocManaged()`` is a dynamic memory allocator available on
   all GPUs with unified memory support. For more details, visit
   :ref:`unified_memory_reference`.
 
-- **HIP Managed Variables**:
+- **HIP managed variables**:
 
   The ``__managed__`` declaration specifier, which serves as its counterpart,
   is supported on all modern AMD cards and can be utilized for static
   allocation.
 
-- **System Allocation API**:
+- **System allocation API**:
 
   Starting with the AMD MI300 series, the ``malloc()`` system allocator allows
   you to reserve unified memory. The system allocator is more versatile and
@@ -106,7 +106,7 @@ system requirements` and :ref:`checking unified memory management support`.
 
 .. _checking unified memory management support:
 
-Checking Unified Memory Management Support
+Checking unified memory management support
 ------------------------------------------
 Some device attributes can offer information about which :ref:`unified memory
 programming models` are supported. The attribute value is 1 if the
@@ -145,7 +145,7 @@ The following examples show how to use device attributes:
         return 0;
     }
 
-Example for Unified Memory Management
+Example for unified memory management
 -------------------------------------
 
 The following example shows how to use unified memory management with
@@ -324,9 +324,10 @@ Memory Management example is presented in the last tab.
 
 .. _using unified memory management:
 
-Using Unified Memory Management (UMM)
+Using unified memory management (UMM)
 =====================================
-Unified Memory Management (UMM) is a feature that can simplify the complexities
+
+Unified memory management (UMM) is a feature that can simplify the complexities
 of memory management in GPU computing. It is particularly useful in
 heterogeneous computing environments with heavy memory usage with both a CPU
 and a GPU, which would require large memory transfers. Here are some areas
@@ -359,12 +360,12 @@ potential performance overhead associated with UMM. You must thoroughly test
 and profile your code to ensure it's the most suitable choice for your use
 case.
 
-.. _unified memory compiler hints:
+.. _unified memory runtime hints:
 
-Unified Memory Compiler Hints for the Better Performance
-========================================================
+Unified memory HIP runtime hints for the better performance
+===========================================================
 
-Unified memory compiler hints can help improve the performance of your code if
+Unified memory HIP runtime hints can help improve the performance of your code if
 you know your code's ability and infrastructure. Some hint techniques are
 presented in this section.
 
@@ -376,10 +377,11 @@ special ``device_id`` values:
 - ``hipInvalidDeviceId`` = -2 means that the device is invalid.
 
 For the best performance, profile your application to optimize the
-utilization of compiler hits.
+utilization of HIP runtime hints.
 
-Data Prefetching
+Data prefetching
 ----------------
+
 Data prefetching is a technique used to improve the performance of your
 application by moving data closer to the processing unit before it's actually
 needed.
@@ -437,8 +439,9 @@ needed.
 Remember to check the return status of ``hipMemPrefetchAsync()`` to ensure that
 the prefetch operations are completed successfully.
 
-Memory Advice
+Memory advice
 -------------
+
 The effectiveness of ``hipMemAdvise()`` comes from its ability to inform the
 runtime system of the developer's intentions regarding memory usage. When the
 runtime system has knowledge of the expected memory access patterns, it can
@@ -504,8 +507,9 @@ Here is the updated version of the example above with memory advice.
     }
 
 
-Memory Range attributes
+Memory range attributes
 -----------------------
+
 Memory Range attributes allow you to query attributes of a given memory range.
 
 The ``hipMemRangeGetAttribute()`` is added to the example to query the
@@ -565,71 +569,9 @@ For more details, visit the
         return 0;
     }
 
-Asynchronously Attach Memory to a Stream
+Asynchronously attach memory to a stream
 ----------------------------------------
 
-The ``hipStreamAttachMemAsync`` function asynchronously attaches memory to a
-stream, which can help concurrent execution when using streams.
-
-In the example, a stream is created by using ``hipStreamCreate()`` and then
-the managed memory is attached to the stream using
-``hipStreamAttachMemAsync()``. The ``hipMemAttachGlobal`` flag is used to
-indicate that the memory can be accessed from any stream on any device.
-The kernel launch and synchronization are now done on this stream.
-Using streams and attaching memory to them can help with overlapping data
-transfers and computation.
-
-For more details and description of flags, visit
-:ref:`unified_memory_reference`.
-
-.. code-block:: cpp
-    :emphasize-lines: 21-24
-
-    #include <hip/hip_runtime.h>
-    #include <iostream>
-
-    // Addition of two values.
-    __global__ void add(int *a, int *b, int *c) {
-        *c = *a + *b;
-    }
-
-    int main() {
-        int *a, *b, *c;
-        hipStream_t stream;
-
-        // Create a stream.
-        hipStreamCreate(&stream);
-
-        // Allocate memory for a, b and c that is accessible to both device and host codes.
-        hipMallocManaged(&a, sizeof(*a));
-        hipMallocManaged(&b, sizeof(*b));
-        hipMallocManaged(&c, sizeof(*c));
-
-        // Attach memory to the stream asynchronously.
-        hipStreamAttachMemAsync(stream, a, sizeof(*a), hipMemAttachGlobal);
-        hipStreamAttachMemAsync(stream, b, sizeof(*b), hipMemAttachGlobal);
-        hipStreamAttachMemAsync(stream, c, sizeof(*c), hipMemAttachGlobal);
-
-        // Setup input values.
-        *a = 1;
-        *b = 2;
-
-        // Launch add() kernel on GPU on the created stream.
-        hipLaunchKernelGGL(add, dim3(1), dim3(1), 0, stream, a, b, c);
-
-        // Wait for stream to finish before accessing on host.
-        hipStreamSynchronize(stream);
+The ``hipStreamAttachMemAsync`` function would be able to asynchronously attach memory to a stream, which can help concurrent execution when using streams.
 
-        // Prints the result.
-        std::cout << *a << " + " << *b << " = " << *c << std::endl;
-
-        // Cleanup allocated memory.
-        hipFree(a);
-        hipFree(b);
-        hipFree(c);
-
-        // Destroy the stream.
-        hipStreamDestroy(stream);
-
-        return 0;
-    }
+Currently, this function is a no-operation (NOP) function on AMD GPUs. It simply returns success after the runtime memory validation passed. This function is necessary on Microsoft Windows, and UMM is not supported on this operating system with AMD GPUs at the moment.