diff --git a/docs/how-to/hip_runtime_api/error_handling.rst b/docs/how-to/hip_runtime_api/error_handling.rst new file mode 100644 index 0000000000..564020ff7b --- /dev/null +++ b/docs/how-to/hip_runtime_api/error_handling.rst @@ -0,0 +1,134 @@ +.. meta:: + :description: Error Handling + :keywords: AMD, ROCm, HIP, error handling, error + +******************************************************************************** +Error handling +******************************************************************************** + +HIP provides functionality to detect, report, and manage errors that occur +during the execution of HIP runtime functions or when launching kernels. Every +HIP runtime function, apart from launching kernels, has :cpp:type:`hipError_t` +as return type. :cpp:func:`hipGetLastError()` and :cpp:func:`hipPeekAtLastError()` +can be used for catching errors from kernel launches, as kernel launches don't +return an error directly. HIP maintains an internal state, that includes the +last error code. :cpp:func:`hipGetLastError` returns and resets that error to +hipSuccess, while :cpp:func:`hipPeekAtLastError` just returns the error without +changing it. To get a human readable version of the errors, +:cpp:func:`hipGetErrorString()` and :cpp:func:`hipGetErrorName()` can be used. + +.. note:: + + :cpp:func:`hipGetLastError` returns the returned error code of the last HIP + runtime API call even if it's hipSuccess, while ``cudaGetLastError`` returns + the error returned by any of the preceding CUDA APIs in the same host thread. + :cpp:func:`hipGetLastError` behavior will be matched with + ``cudaGetLastError`` in ROCm release 7.0. + +Best practices of HIP error handling: + +1. Check errors after each API call - Avoid error propagation. +2. Use macros for error checking - Check :ref:`hip_check_macros`. +3. Handle errors gracefully - Free resources and provide meaningful error + messages to the user. + +For more details on the error handling functions, see :ref:`error handling +functions reference page `. + +.. _hip_check_macros: + +HIP check macros +================================================================================ + +HIP uses check macros to simplify error checking and reduce code duplication. +The ``HIP_CHECK`` macros are mainly used to detect and report errors. It can +also exit from application with ``exit(1);`` function call after the error +print. The ``HIP_CHECK`` macro example: + +.. code-block:: cpp + + #define HIP_CHECK(expression) \ + { \ + const hipError_t status = expression; \ + if(status != hipSuccess){ \ + std::cerr << "HIP error " \ + << status << ": " \ + << hipGetErrorString(status) \ + << " at " << __FILE__ << ":" \ + << __LINE__ << std::endl; \ + } \ + } + +Complete example +================================================================================ + +A complete example to demonstrate the error handling with a simple addition of +two values kernel: + +.. code-block:: cpp + + #include + #include + #include + + #define HIP_CHECK(expression) \ + { \ + const hipError_t status = expression; \ + if(status != hipSuccess){ \ + std::cerr << "HIP error " \ + << status << ": " \ + << hipGetErrorString(status) \ + << " at " << __FILE__ << ":" \ + << __LINE__ << std::endl; \ + } \ + } + + // Addition of two values. + __global__ void add(int *a, int *b, int *c, size_t size) { + const size_t index = threadIdx.x + blockDim.x * blockIdx.x; + if(index < size) { + c[index] += a[index] + b[index]; + } + } + + int main() { + constexpr int numOfBlocks = 256; + constexpr int threadsPerBlock = 256; + constexpr size_t arraySize = 1U << 16; + + std::vector a(arraySize), b(arraySize), c(arraySize); + int *d_a, *d_b, *d_c; + + // Setup input values. + std::fill(a.begin(), a.end(), 1); + std::fill(b.begin(), b.end(), 2); + + // Allocate device copies of a, b and c. + HIP_CHECK(hipMalloc(&d_a, arraySize * sizeof(*d_a))); + HIP_CHECK(hipMalloc(&d_b, arraySize * sizeof(*d_b))); + HIP_CHECK(hipMalloc(&d_c, arraySize * sizeof(*d_c))); + + // Copy input values to device. + HIP_CHECK(hipMemcpy(d_a, &a, arraySize * sizeof(*d_a), hipMemcpyHostToDevice)); + HIP_CHECK(hipMemcpy(d_b, &b, arraySize * sizeof(*d_b), hipMemcpyHostToDevice)); + + // Launch add() kernel on GPU. + hipLaunchKernelGGL(add, dim3(numOfBlocks), dim3(threadsPerBlock), 0, 0, d_a, d_b, d_c, arraySize); + // Check the kernel launch + HIP_CHECK(hipGetLastError()); + // Check for kernel execution error + HIP_CHECK(hipDeviceSynchronize()); + + // Copy the result back to the host. + HIP_CHECK(hipMemcpy(&c, d_c, arraySize * sizeof(*d_c), hipMemcpyDeviceToHost)); + + // Cleanup allocated memory. + HIP_CHECK(hipFree(d_a)); + HIP_CHECK(hipFree(d_b)); + HIP_CHECK(hipFree(d_c)); + + // Print the result. + std::cout << a[0] << " + " << b[0] << " = " << c[0] << std::endl; + + return 0; + } diff --git a/docs/how-to/hip_runtime_api/initialization.rst b/docs/how-to/hip_runtime_api/initialization.rst new file mode 100644 index 0000000000..0e88c1895e --- /dev/null +++ b/docs/how-to/hip_runtime_api/initialization.rst @@ -0,0 +1,105 @@ +.. meta:: + :description: Initialization. + :keywords: AMD, ROCm, HIP, initialization + +******************************************************************************** +Initialization +******************************************************************************** + +The initialization involves setting up the environment and resources needed for +using GPUs. The following steps are covered with the initialization: + +- Setting up the HIP runtime + + This includes reading the environment variables set during init, setting up + the active or visible devices, loading necessary libraries, setting up + internal buffers for memory copies or cooperative launches, initialize the + compiler as well as HSA runtime and checks any errors due to lack of resources + or no active devices. + +- Querying and setting GPUs + + Identifying and querying the available GPU devices on the system. + +- Setting up contexts + + Creating contexts for each GPU device, which are essential for managing + resources and executing kernels. For further details, check the :ref:`context + section `. + +Initialize the HIP runtime +================================================================================ + +The HIP runtime is initialized automatically when the first HIP API call is +made. However, you can explicitly initialize it using :cpp:func:`hipInit`, +to be able to control the timing of the initialization. The manual +initialization can be useful to ensure that the GPU is initialized and +ready, or to isolate GPU initialization time from other parts of +your program. + +.. note:: + + You can use :cpp:func:`hipDeviceReset()` to delete all streams created, memory + allocated, kernels running and events created by the current process. Any new + HIP API call initializes the HIP runtime again. + +Querying and setting GPUs +================================================================================ + +If multiple GPUs are available in the system, you can query and select the +desired GPU(s) to use based on device properties, such as size of global memory, +size shared memory per block, support of cooperative launch and support of +managed memory. + +Querying GPUs +-------------------------------------------------------------------------------- + +The properties of a GPU can be queried using :cpp:func:`hipGetDeviceProperties`, +which returns a struct of :cpp:struct:`hipDeviceProp_t`. The properties in the struct can be +used to identify a device or give an overview of hardware characteristics, that +might make one GPU better suited for the task than others. + +The :cpp:func:`hipGetDeviceCount` function returns the number of available GPUs, +which can be used to loop over the available GPUs. + +Example code of querying GPUs: + +.. code-block:: cpp + + #include + #include + + int main() { + + int deviceCount; + if (hipGetDeviceCount(&deviceCount) == hipSuccess){ + for (int i = 0; i < deviceCount; ++i){ + hipDeviceProp_t prop; + if ( hipGetDeviceProperties(&prop, i) == hipSuccess) + std::cout << "Device" << i << prop.name << std::endl; + } + } + + return 0; + } + +Setting the GPU +-------------------------------------------------------------------------------- + +:cpp:func:`hipSetDevice` function select the GPU to be used for subsequent HIP +operations. This function performs several key tasks: + +- Context Binding + + Binds the current thread to the context of the specified GPU device. This + ensures that all subsequent operations are executed on the selected device. + +- Resource Allocation + + Prepares the device for resource allocation, such as memory allocation and + stream creation. + +- Check device availability + + Checks for errors in device selection and returns error if the specified + device is not available or not capable of executing HIP operations. diff --git a/docs/index.md b/docs/index.md index 6a5617163f..0ef68cd649 100644 --- a/docs/index.md +++ b/docs/index.md @@ -30,7 +30,9 @@ The HIP documentation is organized into the following categories: :::{grid-item-card} How to * {doc}`./how-to/hip_runtime_api` + * {doc}`./how-to/hip_runtime_api/initialization` * {doc}`./how-to/hip_runtime_api/memory_management` + * {doc}`./how-to/hip_runtime_api/error_handling` * {doc}`./how-to/hip_runtime_api/cooperative_groups` * {doc}`./how-to/hip_runtime_api/hipgraph` * [HIP porting guide](./how-to/hip_porting_guide) diff --git a/docs/sphinx/_toc.yml.in b/docs/sphinx/_toc.yml.in index 1f943a7587..f36efda6cc 100644 --- a/docs/sphinx/_toc.yml.in +++ b/docs/sphinx/_toc.yml.in @@ -34,6 +34,7 @@ subtrees: - file: how-to/hip_runtime_api subtrees: - entries: + - file: how-to/hip_runtime_api/initialization - file: how-to/hip_runtime_api/memory_management subtrees: - entries: @@ -46,6 +47,7 @@ subtrees: - file: how-to/hip_runtime_api/memory_management/unified_memory - file: how-to/hip_runtime_api/memory_management/virtual_memory - file: how-to/hip_runtime_api/memory_management/stream_ordered_allocator + - file: how-to/hip_runtime_api/error_handling - file: how-to/hip_runtime_api/cooperative_groups - file: how-to/hip_runtime_api/hipgraph - file: how-to/hip_porting_guide