Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Realm: error when compiling with HIP with NVIDIA GPUs #1713

Open
Tracked by #1032
mariodirenzo opened this issue Jul 10, 2024 · 6 comments
Open
Tracked by #1032

Realm: error when compiling with HIP with NVIDIA GPUs #1713

mariodirenzo opened this issue Jul 10, 2024 · 6 comments
Labels
bug Realm Issues pertaining to Realm
Milestone

Comments

@mariodirenzo
Copy link

mariodirenzo commented Jul 10, 2024

When I try to compile Realm with HIP support on a machine with Nvidia GPUs, I get the following error

/home/mariodr/legion2/runtime/realm/hip/hip_module.cc:3509:24: error: use of undeclared identifier 'hipJitOptionInfoLogBuffer'
      jit_options[0] = hipJitOptionInfoLogBuffer;
                       ^
/home/mariodr/legion2/runtime/realm/hip/hip_module.cc:3510:24: error: use of undeclared identifier 'hipJitOptionInfoLogBufferSizeBytes'
      jit_options[1] = hipJitOptionInfoLogBufferSizeBytes;
                       ^
/home/mariodr/legion2/runtime/realm/hip/hip_module.cc:3511:24: error: use of undeclared identifier 'hipJitOptionErrorLogBuffer'
      jit_options[2] = hipJitOptionErrorLogBuffer;
                       ^
/home/mariodr/legion2/runtime/realm/hip/hip_module.cc:3512:24: error: use of undeclared identifier 'hipJitOptionErrorLogBufferSizeBytes'
      jit_options[3] = hipJitOptionErrorLogBufferSizeBytes;
                       ^
5 warnings and 4 errors generated.
make[2]: *** [runtime/CMakeFiles/RealmRuntime.dir/build.make:580: runtime/CMakeFiles/RealmRuntime.dir/realm/hip/hip_module.cc.o] Error 1

Does anyone have an idea of what is going on?

@elliottslaughter, can you add this issue to #1032?

@elliottslaughter
Copy link
Contributor

What versions of Legion and ROCm are you using?

Our CI has been passing recently and tests ROCm 5.4.3, 5.6.0, 5.7.1, and 6.0.0. So unless you're building with some newer ROCm version, or some flag we're not testing, I don't see why this should be breaking for you.

@mariodirenzo
Copy link
Author

Legion is at the latest version in stable
Rocm is at 6.1.4.

The same versions of Legion and Rocm do not produce errors on AMD-based systems.
It seems necessary to use HIP with the CUDA backend to reproduce the error. Do you run such setup in your CI?

@elliottslaughter
Copy link
Contributor

I missed that you were running on NVIDIA GPUs.

To be honest, I have never run HIP on NVIDIA GPUs. @eddy16112 might know more, he was the one who set up that configuration.

@elliottslaughter elliottslaughter changed the title Realm: error when compiling with HIP Realm: error when compiling with HIP with NVIDIA GPUs Jul 11, 2024
@mariodirenzo
Copy link
Author

Just a kind reminder of this discussion.
At least having a workaround for this issue would enable working on #1688, which is critical for the PSAAP project.

@eddy16112
Copy link
Contributor

eddy16112 commented Jul 16, 2024

I think the error is because the version of the hip you are using does not support the conversion of hipJit to CU_Jit anymore. Actually, this piece of code is never used by the hip module. I pushed a fix to disable it. Can you try to patch this commit 477745b31b1bcd3a1b9c1fb172419ccae4ec90e7 ?

@mariodirenzo
Copy link
Author

It helps though the following patch is also necessary

diff --git a/runtime/CMakeLists.txt b/runtime/CMakeLists.txt
index 871d3a5d8..48f2b5db4 100644
--- a/runtime/CMakeLists.txt
+++ b/runtime/CMakeLists.txt
@@ -625,7 +625,7 @@ if(Legion_USE_HIP)
     target_link_libraries(RealmRuntime PUBLIC CUDA::toolkit)
     # Advertise our driver link to Realm consumers while ensuring libcuda.so isn't in Realm's RPATH
     target_link_libraries(RealmRuntime INTERFACE CUDA::cuda_driver)
-    target_include_directories(RealmRuntime PRIVATE ${HIP_ROOT_DIR}/include)
+    target_include_directories(RealmRuntime PRIVATE ${HIP_INCLUDE_DIRS})
     # for backwards compatibility in applications
     target_compile_definitions(RealmRuntime INTERFACE USE_HIP)

@@ -975,12 +975,14 @@ if(Legion_USE_HIP)
     target_compile_options(LegionRuntime PRIVATE $<$<COMPILE_LANGUAGE:CUDA>:
                            -Xcudafe=--diag_suppress=1444 # Remove once Point class is updated
                            -Xcudafe=--diag_suppress=boolean_controlling_expr_is_constant>)
-    target_include_directories(LegionRuntime PRIVATE ${HIP_ROOT_DIR}/include)
+    target_include_directories(LegionRuntime PRIVATE ${HIP_INCLUDE_DIRS})
+    set_target_cuda_architectures(LegionRuntime ARCHITECTURES ${Legion_CUDA_ARCH})
     # complex reduction ops bring in a public dependency on cuda headers
     if(Legion_REDOP_COMPLEX)
       target_link_libraries(LegionRuntime PUBLIC CUDA::toolkit)
-      target_compile_definitions(LegionRuntime PUBLIC __HIP_PLATFORM_NVIDIA__)
     endif()
+    target_compile_definitions(LegionRuntime PRIVATE __HIP_PLATFORM_NVIDIA__)
+
   elseif(Legion_HIP_TARGET STREQUAL "ROCM")
     set(HIP_LIBRARIES ${HIP_ROOT_DIR}/lib/libamdhip64.so)
     set_source_files_properties(${LEGION_HIP_SRC} PROPERTIES HIP_SOURCE_PROPERTY_FORMAT 1)

@muraj muraj added the Realm Issues pertaining to Realm label Jul 26, 2024
@muraj muraj added Realm Issues pertaining to Realm and removed Realm Issues pertaining to Realm labels Sep 16, 2024
@apryakhin apryakhin added this to the realm-backlog milestone Sep 17, 2024
@apryakhin apryakhin changed the title Realm: error when compiling with HIP with NVIDIA GPUs [BUG] Realm: error when compiling with HIP with NVIDIA GPUs Sep 17, 2024
@apryakhin apryakhin added the bug label Sep 20, 2024
@apryakhin apryakhin changed the title [BUG] Realm: error when compiling with HIP with NVIDIA GPUs Realm: error when compiling with HIP with NVIDIA GPUs Sep 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Realm Issues pertaining to Realm
Projects
None yet
Development

No branches or pull requests

5 participants