ROCm · neon60 · Oct 14, 2024 · Oct 15, 2024 · Oct 25, 2024 · Oct 25, 2024
@@ -72,7 +72,7 @@ iGPU
 inlined
 inplace
 interop
-Interoperation
+interoperation
 interoperate
 Interprocess
 interprocess
@@ -113,6 +113,8 @@ omnitrace
 overindex
 overindexing
 oversubscription
+overutilized
+parallelizable
 pixelated
 pragmas
 preallocated
@@ -150,11 +152,14 @@ texels
 tradeoffs
 templated
 toolkits
+transfering
 typedefs
 unintuitive
 UMM
 unmap
+unmapped
+unmapping
 upscaled
 variadic
 WinGDB
-zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz
+zc
@@ -0,0 +1,126 @@
+.. meta::
+    :description: This page describes call stack concept in HIP
+    :keywords: AMD, ROCm, HIP, call stack
+
+*******************************************************************************
+Call stack
+*******************************************************************************
+
+The call stack is a data structure for managing function calls, by saving the
+state of the current function. Each time a function is called, a new call frame
+is added to the top of the stack, containing information such as local
+variables, return addresses and function parameters. When the function
+execution completes, the frame is removed from the stack and loaded back into
+the corresponding registers. This concept allows the program to return to the
+calling function and continue execution from where it left off.
+
+The call stack for each thread must track its function calls, local variables,
+and return addresses. However, in GPU programming, the memory required to store
+the call stack increases due to the parallelism inherent to the GPUs. NVIDIA
+and AMD GPUs use different approaches. NVIDIA GPUs have the independent thread
+scheduling feature where each thread has its own call stack and effective
+program counter. On AMD GPUs threads are grouped; each warp has its own call
+stack and program counter, but not each thread. Warps are described and 
+explained in the :ref:`inherent_thread_hierarchy`.
+
+If a thread or warp exceeds its stack size, a stack overflow occurs, causing
+kernel failure. This can be detected using debuggers.
+
+Call stack management with HIP
+===============================================================================
+
+You can adjust the call stack size as shown in the following example, allowing
+fine-tuning based on specific kernel requirements. This helps prevent stack
+overflow errors by ensuring sufficient stack memory is allocated.
+
+.. code-block:: cpp
+
+    #include <hip/hip_runtime.h>
+    #include <iostream>
+
+    #define HIP_CHECK(expression)                \
+    {                                            \
+        const hipError_t status = expression;    \
+        if(status != hipSuccess){                \
+                std::cerr << "HIP error "        \
+                    << status << ": "            \
+                    << hipGetErrorString(status) \
+                    << " at " << __FILE__ << ":" \
+                    << __LINE__ << std::endl;    \
+        }                                        \
+    }
+
+    int main()
+    {
+        size_t stackSize;
+        HIP_CHECK(hipDeviceGetLimit(&stackSize, hipLimitStackSize));
+        std::cout << "Default stack size: " << stackSize << " bytes" << std::endl;
+
+        // Set a new stack size
+        size_t newStackSize = 1024 * 8; // 8 KiB
+        HIP_CHECK(hipDeviceSetLimit(hipLimitStackSize, newStackSize));
+
+        HIP_CHECK(hipDeviceGetLimit(&stackSize, hipLimitStackSize));
+        std::cout << "Updated stack size: " << stackSize << " bytes" << std::endl;
+
+        return 0;
+    }
+
+Handling recursion and deep function calls
+-------------------------------------------------------------------------------
+
+Similar to CPU programming, recursive functions and deeply nested function
+calls are supported. However, developers must ensure that these functions do
+not exceed the available stack memory, considering the huge amount of memory
+needed for the call stack due to the GPUs inherent parallelism. This can be
+achieved by increasing stack size, implementing error handling to catch stack
+overflow, optimizing code to reduce stack usage, and utilizing profiling tools
+to monitor stack memory. Proper kernel design and memory management strategies
+are essential to maintain efficient and stable application performance.
+
+.. code-block:: cpp
+
+    #include <hip/hip_runtime.h>
+    #include <iostream>
+
+    #define HIP_CHECK(expression)                \
+    {                                            \
+        const hipError_t status = expression;    \
+        if(status != hipSuccess){                \
+                std::cerr << "HIP error "        \
+                    << status << ": "            \
+                    << hipGetErrorString(status) \
+                    << " at " << __FILE__ << ":" \
+                    << __LINE__ << std::endl;    \
+        }                                        \
+    }
+
+    __device__ unsigned long long fibonacci(unsigned long long n)
+    {
+        if (n == 0 || n == 1)
+        {
+            return n;
+        }
+        return fibonacci(n - 1) + fibonacci(n - 2);
+    }
+
+    __global__ void kernel(unsigned long long n)
+    {
+        unsigned long long result = fibonacci(n);
+        const size_t x = threadIdx.x + blockDim.x * blockIdx.x;
+
+        if (x == 0)
+            printf("%llu! = %llu \n", n, result);
+    }
+
+    int main()
+    {
+        kernel<<<1, 1>>>(10);
+        HIP_CHECK(hipDeviceSynchronize());
+
+        // With -O0 optimization option hit the stack limit
+        // kernel<<<1, 256>>>(2048);
+        // HIP_CHECK(hipDeviceSynchronize());
+
+        return 0;
+    }
@@ -0,0 +1,116 @@
+.. meta::
+   :description: HIP provides an external resource interoperability API that
+                 allows efficient data sharing between HIP's computing power and
+                 OpenGL's graphics rendering.
+   :keywords: AMD, ROCm, HIP, external, interop, interoperability
+
+*******************************************************************************
+External resource interoperability
+*******************************************************************************
+
+This feature allows HIP to work with resources -- like memory and semaphores --
+created by other APIs. This means resources can be used from APIs like CUDA,
+OpenCL and Vulkan within HIP, making it easier to integrate HIP into existing
+projects.
+
+To use external resources in HIP, you typically follow these steps:
+
+- Import resources from other APIs using HIP provided functions
+- Use external resources as if they were created in HIP
+- Destroy the HIP resource object to clean up
+
+Semaphore Functions
+===============================================================================
+
+Semaphore functions are essential for synchronization in parallel computing.
+These functions facilitate communication and coordination between different
+parts of a program or between different programs. By managing semaphores, tasks
+are executed in the correct order, and resources are utilized effectively.
+Semaphore functions ensure smooth operation, preventing conflicts and
+maintaining the integrity of processes; upholding the integrity and performance
+of concurrent processes.
+
+External semaphore functions can be used in HIP as described in :doc:`../reference/hip_runtime_api/external_interop`.
+
+Memory Functions
+===============================================================================
+
+HIP external memory functions focus on the efficient sharing and management of
+memory resources. These functions enable importing memory created by external
+systems, enabling the HIP program to use this memory seamlessly. Memory
+functions include mapping memory for effective use and ensuring proper cleanup
+to prevent resource leaks. This is critical for performance, particularly in
+applications handling large datasets or complex structures such as textures in
+graphics. Proper memory management ensures stability and efficient resource
+utilization.
+
+Example
+===============================================================================
+
+ROCm examples include a
+`HIP--Vulkan interoperation example <https://github.com/ROCm/rocm-examples/tree/develop/HIP-Basic/vulkan_interop>`_
+demonstrates how to perform interoperation between HIP and Vulkan.
+
+In this example, a simple HIP kernel is used to compute a sine wave, which is
+then rendered to a window as a graphical output using Vulkan. The process
+requires several initialization steps, such as setting up a HIP context,
+creating a Vulkan instance, and configuring the GPU device and queue. After
+these initial steps, the kernel executes the sine wave computation, and Vulkan
+continuously updates the window framebuffer to display the computed data until
+the window is closed.
+
+The following code converts a Vulkan memory handle to its equivalent HIP
+handle. The input VkDeviceMemory and the created HIP memory represents the same
+physical area of GPU memory, through the handles of each respective API.
+Writing to the buffer in one API will allow us to read the results through the
+other. Note that access to the buffer should be synchronized between the APIs,
+for example using queue syncs or semaphores.
+
+.. literalinclude:: ../../tools/example_codes/external_interop.hip
+   :start-after: // [Sphinx vulkan memory to hip start]
+   :end-before: // [Sphinx vulkan memory to hip end]
+   :language: cpp
+
+The Vulkan semaphore is converted to HIP semaphore shown in the following
+example. Signaling on the semaphore in one API will allow the other API to wait
+on it, which is how we can guarantee synchronized access to resources in a
+cross-API manner.
+
+.. literalinclude:: ../../tools/example_codes/external_interop.hip
+   :start-after: // [Sphinx semaphore convert start]
+   :end-before: // [Sphinx semaphore convert end]
+   :language: cpp
+
+When the HIP external memory is exported from Vulkan and imported to HIP, it is
+not yet ready for use. The Vulkan handle is shared, allowing for memory sharing
+rather than copying during the export process. To actually use the memory, we
+need to map it to a pointer so that we may pass it to the kernel so that it can
+be read from and written to. The external memory map to HIP in the following
+example:
+
+.. literalinclude:: ../../tools/example_codes/external_interop.hip
+   :start-after: // [Sphinx map external memory start]
+   :end-before: // [Sphinx map external memory end]
+   :language: cpp
+
+Wait for buffer is ready and not under modification at Vulkan side:
+
+.. literalinclude:: ../../tools/example_codes/external_interop.hip
+   :start-after: // [Sphinx wait semaphore start]
+   :end-before: // [Sphinx wait semaphore end]
+   :language: cpp
+
+The sinewave kernel implementation:
+
+.. literalinclude:: ../../tools/example_codes/external_interop.hip
+   :start-after: [Sphinx sinewave kernel start]
+   :end-before: // [Sphinx sinewave kernel end]
+   :language: cpp
+
+Signal to Vulkan that we are done with the buffer and that it can proceed with
+rendering:
+
+.. literalinclude:: ../../tools/example_codes/external_interop.hip
+   :start-after: // [Sphinx signal semaphore start]
+   :end-before: // [Sphinx signal semaphore end]
+   :language: cpp