diff --git a/docs/how-to/unified_memory.rst b/docs/how-to/unified_memory.rst index 375ee286f7..500882d964 100644 --- a/docs/how-to/unified_memory.rst +++ b/docs/how-to/unified_memory.rst @@ -7,8 +7,6 @@ Unified Memory ******************************************************************************* -Introduction -============ In conventional architectures, CPUs and GPUs have dedicated memory like Random Access Memory (RAM) and Video Random Access Memory (VRAM). This architectural design, while effective, can be limiting in terms of memory capacity and @@ -83,13 +81,13 @@ page-fault. For more details, visit Unified Memory Programming Models ================================= -Showcasing various unified memory programming models, their availability -depends on your architecture. For further details, visit :ref:`unified memory +Showcasing various unified memory programming models, the model availability +depends on your architecture. For more information, see :ref:`unified memory system requirements` and :ref:`checking unified memory management support`. - **HIP Managed Memory Allocation API**: - The ``hipMallocManaged()`` is a dynamic memory allocator that is available on + The ``hipMallocManaged()`` is a dynamic memory allocator available on all GPUs with unified memory support. For more details, visit :ref:`unified_memory_reference`. @@ -101,8 +99,8 @@ system requirements` and :ref:`checking unified memory management support`. - **System Allocation API**: - Starting with the MI300 series, the ``malloc()`` system allocator allows you - to reserve unified memory. The system allocator is more versatile, and it + Starting with the AMD MI300 series, the ``malloc()`` system allocator allows + you to reserve unified memory. The system allocator is more versatile and offers an easy transition from a CPU written C++ code to a HIP code as the same system allocation API is used. @@ -110,8 +108,8 @@ system requirements` and :ref:`checking unified memory management support`. Checking Unified Memory Management Support ------------------------------------------ -Some device attribute can offer information about which :ref:`unified memory -programming models` are supported. The attribute value is an integer 1 if the +Some device attributes can offer information about which :ref:`unified memory +programming models` are supported. The attribute value is 1 if the functionality is supported, and 0 if it is not supported. .. list-table:: Device attributes for unified memory management @@ -340,25 +338,25 @@ where UMM can be beneficial: it easier for developers to write code without worrying about memory allocation and deallocation details. -- **Data Migration**: - +- **Data Migration**: + UMM allows for efficient data migration between the host (CPU) and the device (GPU). This can be particularly useful for applications that need to move data back and forth between the device and host. - **Improved Programming Productivity**: - As a positive side effect, the use of UMM can reduce the lines of code, - thereby improving programming productivity. + As a positive side effect, UMM can reduce the lines of code, thereby + improving programming productivity. In HIP, pinned memory allocations are coherent by default. Pinned memory is host memory mapped into the address space of all GPUs, meaning that the pointer can be used on both host and device. Using pinned memory instead of pageable memory on the host can improve bandwidth. -While UMM can provide numerous benefits, it is important to be aware of the +While UMM can provide numerous benefits, it's important to be aware of the potential performance overhead associated with UMM. You must thoroughly test -and profile your code to ensure it is the most suitable choice for your use +and profile your code to ensure it's the most suitable choice for your use case. .. _unified memory compiler hints: @@ -366,9 +364,9 @@ case. Unified Memory Compiler Hints for the Better Performance ======================================================== -Unified memory compiler hints can help to improve the performance of your code, -if you know the ability of your code and the infrastructure that you use. Some -hint techniques are presented in this section. +Unified memory compiler hints can help improve the performance of your code if +you know your code's ability and infrastructure. Some hint techniques are +presented in this section. The hint functions can set actions on a selected device, which can be identified by ``hipGetDeviceProperties(&prop, device_id)``. There are two @@ -377,13 +375,13 @@ special ``device_id`` values: - ``hipCpuDeviceId`` = -1 means that the advised device is the CPU. - ``hipInvalidDeviceId`` = -2 means that the device is invalid. -For the best performance you can profile your application to optimize the +For the best performance, profile your application to optimize the utilization of compiler hits. Data Prefetching ---------------- Data prefetching is a technique used to improve the performance of your -application by moving data closer to the processing unit before it is actually +application by moving data closer to the processing unit before it's actually needed. .. code-block:: cpp @@ -437,9 +435,9 @@ needed. } Remember to check the return status of ``hipMemPrefetchAsync()`` to ensure that -the prefetch operations complete successfully! +the prefetch operations are completed successfully. -Memory Advise +Memory Advice ------------- The effectiveness of ``hipMemAdvise()`` comes from its ability to inform the runtime system of the developer's intentions regarding memory usage. When the @@ -449,10 +447,10 @@ efficient execution of the application. However, the actual impact on performance can vary based on the specific use case and the hardware architecture. -For the description of ``hipMemAdvise()`` and the detailed list of advises, +For the description of ``hipMemAdvise()`` and the detailed list of advice, visit the :ref:`unified_memory_reference`. -Here is the updated version of the example above with memory advises. +Here is the updated version of the example above with memory advice. .. code-block:: cpp :emphasize-lines: 17-26 @@ -468,17 +466,17 @@ Here is the updated version of the example above with memory advises. int main() { int *a, *b, *c; - // Allocate memory for a, b and c that is accessible to both device and host codes. + // Allocate memory for a, b, and c accessible to both device and host codes. hipMallocManaged(&a, sizeof(*a)); hipMallocManaged(&b, sizeof(*b)); hipMallocManaged(&c, sizeof(*c)); - // Set memory advise for a, b, and c to be accessed by the CPU. + // Set memory advice for a, b, and c to be accessed by the CPU. hipMemAdvise(a, sizeof(*a), hipMemAdviseSetPreferredLocation, hipCpuDeviceId); hipMemAdvise(b, sizeof(*b), hipMemAdviseSetPreferredLocation, hipCpuDeviceId); hipMemAdvise(c, sizeof(*c), hipMemAdviseSetPreferredLocation, hipCpuDeviceId); - // Additionally, set memory advise for a, b, and c to be read mostly from the device 0. + // Additionally, set memory advice for a, b, and c to be read mostly from the device 0. constexpr int device = 0; hipMemAdvise(a, sizeof(*a), hipMemAdviseSetReadMostly, device); hipMemAdvise(b, sizeof(*b), hipMemAdviseSetReadMostly, device); @@ -570,9 +568,8 @@ For more details, visit the Asynchronously Attach Memory to a Stream ---------------------------------------- -The ``hipStreamAttachMemAsync`` function is used to asynchronously attach -memory to a stream, which can help with concurrent execution when using -streams. +The ``hipStreamAttachMemAsync`` function asynchronously attaches memory to a +stream, which can help concurrent execution when using streams. In the example, a stream is created by using ``hipStreamCreate()`` and then the managed memory is attached to the stream using