Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add expected behavior to unified memory #3617

Open
wants to merge 7 commits into
base: docs/develop
Choose a base branch
from

Conversation

matyas-streamhpc
Copy link

No description provided.

@matyas-streamhpc matyas-streamhpc self-assigned this Sep 19, 2024
@neon60 neon60 marked this pull request as draft September 19, 2024 09:43
@neon60 neon60 force-pushed the update-unified-memory branch 3 times, most recently from 7e27ba9 to 84f9097 Compare September 29, 2024 17:13
@neon60 neon60 force-pushed the update-unified-memory branch 4 times, most recently from 934e4d3 to 4baf272 Compare October 9, 2024 09:59
@neon60 neon60 force-pushed the update-unified-memory branch 3 times, most recently from 546b8d9 to 6244fc2 Compare October 15, 2024 05:37
@neon60 neon60 changed the base branch from docs/develop to hip_runtime_api_how-to October 15, 2024 08:23
@neon60 neon60 marked this pull request as ready for review October 15, 2024 08:23
@neon60 neon60 requested a review from MKKnorr October 15, 2024 08:24
@neon60 neon60 changed the base branch from hip_runtime_api_how-to to docs/develop October 31, 2024 14:30
********************************************************************************

The HIP runtime API provides C and C++ functionalities to manage event, stream,
and memory on GPUs. On AMD ROCm software, the HIP runtime uses :doc:`Common

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
and memory on GPUs. On AMD ROCm software, the HIP runtime uses :doc:`Common
and memory on GPUs. On AMD ROCm software, the HIP runtime uses :doc:`Compute

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apparently CLR is supposed to be Compute Language Runtime

The runtime offers functions for allocating, freeing, and copying device memory,
along with transferring data between host and device memory.

Here are the various memory management techniques:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like these techniques could use a brief description?

.. figure:: ../../../data/how-to/hip_runtime_api/memory_management/pageable_pinned.svg

The pageable and pinned memory allow you to exercise direct control over
memory operations, which is known as explicit memory management. When using the

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In memory_management.rst we list four memory management techniques, none of which is "explicit memory management". Is it possible to relate this explicit memory management to one of the four techniques? Or should it be listed as one of the techniques?

// Run the kernel
// ...

HIP_CHECK(hipMemcpy(device_input, host_input, element_number * sizeof(int), hipMemcpyHostToDevice));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem right to me. We have device_output and host_output, shouldn't those come into play here?

Pinned memory
================================================================================

Pinned memory or page-locked memory is stored in pages that are locked in specific sectors in RAM and can't be migrated. The pointer can be used on both

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like we should note that pinned memory is allocated by hipMalloc() and hipHostMalloc().

// Run the kernel
// ...

HIP_CHECK(hipMemcpy(device_input, host_input, element_number * sizeof(int), hipMemcpyHostToDevice));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be DeviceToHost?

as HBM2e. Device memory can be allocated as global memory, constant, texture or
surface memory.

Global memory

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest making this a bulleted list rather than four separate subheads.

:sup:`1` The :cpp:func:`hipHostMalloc` memory allocation coherence mode can be
affected by the ``HIP_HOST_COHERENT`` environment variable, if the
``hipHostMallocCoherent``, ``hipHostMallocNonCoherent``, and
``hipHostMallocMapped`` are unset. If neither these flags nor the

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here the footnote to the table references hipHostMallocMapped flag, but the table above does not show this flag. This also is mentioned in the following note.

functions. To unmap the memory, use :cpp:func:`hipMemUnmap`. To release the
virtual address range, use :cpp:func:`hipMemAddressFree`. Finally, to release
the physical memory, use :cpp:func:`hipMemRelease`. A side effect of these
functions is the lack of synchronization when memory is released. If you call
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you call :cpp:func:hipFree when you have multiple streams running in parallel, it
synchronizes the device. This causes worse resource usage and performance.

This seems like it could be an important note?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On line 24: - Learning curve: Requires additional effort to understand and utilize SOMA effectively.

Copy link

@randyh62 randyh62 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added comments

@neon60 neon60 changed the base branch from docs/develop to hip_runtime_api_how-to November 5, 2024 07:57
@neon60 neon60 force-pushed the hip_runtime_api_how-to branch 2 times, most recently from eb41cb1 to 7cb026c Compare November 7, 2024 12:00
@MKKnorr MKKnorr force-pushed the update-unified-memory branch 2 times, most recently from 0c4abec to 07a7fdc Compare November 8, 2024 10:41
Copy link

@randyh62 randyh62 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me.

@neon60 neon60 deleted the branch docs/develop November 18, 2024 17:39
@neon60 neon60 closed this Nov 18, 2024
@neon60 neon60 reopened this Nov 18, 2024
@MKKnorr MKKnorr changed the base branch from hip_runtime_api_how-to to docs/develop November 19, 2024 09:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants