From 0918fc125e4ccd7cd0aa5790045a35cca7dcb70e Mon Sep 17 00:00:00 2001 From: Istvan Kiss Date: Thu, 23 May 2024 22:34:45 +0200 Subject: [PATCH] WIP --- docs/how-to/hip_rtc.md | 26 +++++++++++++++----------- docs/how-to/programming_manual.md | 6 ++++++ 2 files changed, 21 insertions(+), 11 deletions(-) diff --git a/docs/how-to/hip_rtc.md b/docs/how-to/hip_rtc.md index 9d0b2ea1e3..a08b2458e0 100644 --- a/docs/how-to/hip_rtc.md +++ b/docs/how-to/hip_rtc.md @@ -5,16 +5,15 @@ Kernels can be stored as a text string and can be passed to HIPRTC APIs alongsid NOTE: - * This library can be used on systems without HIP installed nor AMD GPU driver installed at all (offline compilation). Therefore, it does not depend on any HIP runtime library. - * But it does depend on COMGr. You may try to statically link COMGr into HIPRTC to avoid any ambiguity. - * Developers can decide to bundle this library with their application. +* This library can be used on systems without HIP installed nor AMD GPU driver installed at all (offline compilation). Therefore, it does not depend on any HIP runtime library. +* But it does depend on COMGr. You may try to statically link COMGr into HIPRTC to avoid any ambiguity. +* Developers can decide to bundle this library with their application. ## Example To use HIPRTC functionality, HIPRTC header needs to be included first. ```#include ``` - Kernels can be stored in a string: ```cpp @@ -45,6 +44,7 @@ hiprtcCreateProgram API also allows you to add headers which can be included in For online compilation, the compiler pre-defines HIP device API functions, HIP specific types and macros for device compilation, but does not include standard C/C++ headers by default. Users can only include header files provided to hiprtcCreateProgram. After associating the kernel string with hiprtcProgram, you can now compile this program using: + ```cpp hiprtcCompileProgram(prog, // hiprtcProgram 0, // Number of options @@ -227,10 +227,10 @@ int main() { ## HIPRTC specific options HIPRTC provides a few HIPRTC specific flags - * ```--gpu-architecture``` : This flag can guide the code object generation for a specific gpu arch. Example: ```--gpu-architecture=gfx906:sramecc+:xnack-```, its equivalent to ```--offload-arch```. +* ```--gpu-architecture``` : This flag can guide the code object generation for a specific gpu arch. Example: ```--gpu-architecture=gfx906:sramecc+:xnack-```, its equivalent to ```--offload-arch```. * This option is compulsory if compilation is done on a system without AMD GPUs supported by HIP runtime. * Otherwise, HIPRTC will load the hip runtime and gather the current device and its architecture info and use it as option. - * ```-fgpu-rdc``` : This flag when provided during the hiprtcCompileProgram generates the bitcode (HIPRTC doesn't convert this bitcode into ISA and binary). This bitcode can later be fetched using hiprtcGetBitcode and hiprtcGetBitcodeSize APIs. +* ```-fgpu-rdc``` : This flag when provided during the hiprtcCompileProgram generates the bitcode (HIPRTC doesn't convert this bitcode into ISA and binary). This bitcode can later be fetched using hiprtcGetBitcode and hiprtcGetBitcodeSize APIs. ### Bitcode @@ -320,11 +320,13 @@ hipModuleLoadData(&module, binary); #### Note - * The compiled binary must be loaded before HIPRTC link instance is destroyed using the hiprtcLinkDestroy API. +* The compiled binary must be loaded before HIPRTC link instance is destroyed using the hiprtcLinkDestroy API. + ```cpp hiprtcLinkDestroy(rtc_link_state); ``` - * The correct sequence of calls is : hiprtcLinkCreate, hiprtcLinkAddData or hiprtcLinkAddFile, hiprtcLinkComplete, hiprtcModuleLoadData, hiprtcLinkDestroy. + +* The correct sequence of calls is : hiprtcLinkCreate, hiprtcLinkAddData or hiprtcLinkAddFile, hiprtcLinkComplete, hiprtcModuleLoadData, hiprtcLinkDestroy. ### Input Types @@ -392,6 +394,7 @@ HIPRTC_ERROR_NO_LOWERED_NAMES_BEFORE_COMPILATION = 10, HIPRTC_ERROR_NAME_EXPRESSION_NOT_VALID = 11, HIPRTC_ERROR_INTERNAL_ERROR = 12 ``` + ```cpp hiprtcResult result; result = hiprtcCompileProgram(prog, 1, opts); @@ -416,9 +419,9 @@ The two APIs hiprtcAddNameExpression and hiprtcGetLoweredName provide this funct ### Note - * The identical name expression string must be provided on a subsequent call to hiprtcGetLoweredName to extract the lowered name. - * The correct sequence of calls is : hiprtcAddNameExpression, hiprtcCompileProgram, hiprtcGetLoweredName, hiprtcDestroyProgram. - * The lowered names must be fetched using hiprtcGetLoweredName only after the HIPRTC program has been compiled, and before it has been destroyed. +* The identical name expression string must be provided on a subsequent call to hiprtcGetLoweredName to extract the lowered name. +* The correct sequence of calls is : hiprtcAddNameExpression, hiprtcCompileProgram, hiprtcGetLoweredName, hiprtcDestroyProgram. +* The lowered names must be fetched using hiprtcGetLoweredName only after the HIPRTC program has been compiled, and before it has been destroyed. ### Example @@ -439,6 +442,7 @@ template __global__ void f3(int *result) { *result = sizeof(T); } )"}; ``` + hiprtcAddNameExpression is called with various name expressions referring to the address of ```__global__``` functions and ```__device__/__constant__``` variables. ```cpp diff --git a/docs/how-to/programming_manual.md b/docs/how-to/programming_manual.md index b19b75724d..8bc0ca0939 100644 --- a/docs/how-to/programming_manual.md +++ b/docs/how-to/programming_manual.md @@ -41,6 +41,7 @@ In order to achieve this fine-grained coherence, many AMD GPUs use a limited cac HIP provides the developer with controls to select which type of memory is used via allocation flags passed to hipHostMalloc and the HIP_HOST_COHERENT environment variable. By default, the environment variable HIP_HOST_COHERENT is set to 0 in HIP. The control logic in the current version of HIP is as follows: + * No flags are passed in: the host memory allocation is coherent, the HIP_HOST_COHERENT environment variable is ignored. * hipHostMallocCoherent=1: The host memory allocation will be coherent, the HIP_HOST_COHERENT environment variable is ignored. * hipHostMallocMapped=1: The host memory allocation will be coherent, the HIP_HOST_COHERENT environment variable is ignored. @@ -101,6 +102,7 @@ else { . . . } ``` + Please note, the managed memory capability check may not be necessary, but if HMM is not supported, then managed malloc will fall back to using system memory and other managed memory API calls will have undefined behavior. Note, managed memory management is implemented on Linux, not supported on Windows yet. @@ -191,12 +193,16 @@ HIP-Clang supports generating two types of static libraries. The first type of s In addition, the first type of library contains host objects with device code embedded as fat binaries. It is generated using the flag --emit-static-lib. The second type of library contains relocatable device objects and is generated using ar. Here is an example to create and use static libraries: + * Type 1 using --emit-static-lib: + ```cpp hipcc hipOptLibrary.cpp --emit-static-lib -fPIC -o libHipOptLibrary.a gcc test.cpp -L. -lhipOptLibrary -L/path/to/hip/lib -lamdhip64 -o test.out ``` + * Type 2 using system ar: + ```cpp hipcc hipDevice.cpp -c -fgpu-rdc -o hipDevice.o ar rcsD libHipDevice.a hipDevice.o