Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Omega_h not compatible with CUDA on Weaver #999

Open
ikalash opened this issue Oct 20, 2023 · 20 comments
Open

Omega_h not compatible with CUDA on Weaver #999

ikalash opened this issue Oct 20, 2023 · 20 comments
Assignees
Labels
CUDA Omega_h Topics related to the Omega_h mesh library Testing Stuff related to testing Albany (including nightly tests)

Comments

@ikalash
Copy link
Collaborator

ikalash commented Oct 20, 2023

I turned on Omega_h in the weaver nightlies and it looks like it's not compatible with the CUDA library:

CMake Error at CMakeLists.txt:136 (message):
  CUDA 11.2 does not support Omega_h, use an older or newer version

-- Configuring incomplete, errors occurred!
See also "/projects/albany/nightlyCDashWeaver/build/AlbBuild/tpls/omegah/Omega_h-prefix/src/Omega_h-build/CMakeFiles/CMakeOutput.log".
gmake[2]: *** [CMakeFiles/Omega_h.dir/build.make:92: Omega_h-prefix/src/Omega_h-stamp/Omega_h-configure] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:83: CMakeFiles/Omega_h.dir/all] Error 2
gmake: *** [Makefile:91: all] Error 2

CMake Error at cmake/GetOrInstallOmegah.cmake:115 (message):
  Die
Call Stack (most recent call first):
  CMakeLists.txt:753 (include)

https://sems-cdash-son.sandia.gov/cdash/build/53415/configure

I presume we will just punt on turning on Omega_h on weaver, or is there a different plan?

@jewatkins @mcarlson801

@ikalash ikalash added Testing Stuff related to testing Albany (including nightly tests) Omega_h Topics related to the Omega_h mesh library CUDA labels Oct 20, 2023
@jewatkins
Copy link
Collaborator

@cwsmith what versions of cuda are supported?

@cwsmith
Copy link
Collaborator

cwsmith commented Oct 20, 2023

Hmmm. That check may be a bit conservative now that we have a 'pure' kokkos backend that doesn't rely on thrust; there were thrust bugs in some cuda releases. I'll run a test with the problematic cuda 11.2 and the new backend to confirm.

@cwsmith cwsmith self-assigned this Oct 20, 2023
@cwsmith
Copy link
Collaborator

cwsmith commented Oct 23, 2023

@jewatkins I'm running tests now (tracked here) and will keep you posted.

@cwsmith
Copy link
Collaborator

cwsmith commented Oct 23, 2023

@jewatkins CUDA >= 11.4.4 works (with GCC 10.4.0 in this testing, newer/other versions are fine) in my testing.

@jewatkins
Copy link
Collaborator

@ikalash maybe it's best just to turn off omega_h for this build for now since we'll likely transition off of weaver and onto blake. I can test omega_h + cuda there

@ikalash
Copy link
Collaborator Author

ikalash commented Oct 23, 2023

We can definitely turn it off in the weaver nightlies. I will do this if there are no objections.

How about PM? That one is currently using gcc 11.2.0, which it sounds like might be problematic for omega_h. I was going to tell @mcarlson801 to try turning it on there once we got the weaver ones up, but it sounds like we may have to hold off.

@bartgol
Copy link
Collaborator

bartgol commented Oct 23, 2023

Why are we turning off weaver? Does blake feature V100 as well? I thought it didn't... Since Summit's life got extended by a year, I think it's best to keep V100 tested somewhere, so if blake does not feature V100, we should prob keep weaver.

@jewatkins
Copy link
Collaborator

We're not turning off weaver yet, just disabling omega_h. There's issues with the new module set on weaver (I sank a lot of time on it last FY) and there are open tickets which have not been resolved. blake has H100. Plan is to keep weaver online for as long as summit is online or if it takes too much work to maintain.

@jewatkins
Copy link
Collaborator

We can definitely turn it off in the weaver nightlies. I will do this if there are no objections.

How about PM? That one is currently using gcc 11.2.0, which it sounds like might be problematic for omega_h. I was going to tell @mcarlson801 to try turning it on there once we got the weaver ones up, but it sounds like we may have to hold off.

It makes sense to test omega_h on perlmutter so I'd go ahead and try to turn it on there.

@ikalash
Copy link
Collaborator Author

ikalash commented Oct 23, 2023

It makes sense to test omega_h on perlmutter so I'd go ahead and try to turn it on there.

Could you please try this @mcarlson801 ?

@jewatkins
Copy link
Collaborator

FYI, he's OOO this week

@ikalash
Copy link
Collaborator Author

ikalash commented Oct 24, 2023

Thanks for reminding me @jewatkins . It is no rush.

@bartgol
Copy link
Collaborator

bartgol commented Oct 24, 2023

@jewatkins CUDA >= 11.4.4 works (with GCC 10.4.0 in this testing, newer/other versions are fine) in my testing.

@cwsmith can we remove (or tune better) the check on the version then?

@cwsmith
Copy link
Collaborator

cwsmith commented Oct 24, 2023

@bartgol Yeah, I'm going to add this today to cmake and spack.

@ikalash
Copy link
Collaborator Author

ikalash commented Oct 24, 2023

Please let me know when the fix is pushed and I can re-activate Omega_h in the Weaver nightlies.

@cwsmith
Copy link
Collaborator

cwsmith commented Oct 24, 2023

Omega_h v10.8.3 has the fixed cuda check: SCOREC/omega_h@40a2d36 .

@ikalash
Copy link
Collaborator Author

ikalash commented Nov 8, 2023

Sorry @cwsmith I just saw your comment now. Should I try turning Omega_h on in the weaver builds?

@mcarlson801
Copy link
Collaborator

Ah, I missed this while I was out. I'll try turning it on for Perlmutter as well for this week's test.

@jewatkins
Copy link
Collaborator

Sorry @cwsmith I just saw your comment now. Should I try turning Omega_h on in the weaver builds?

that fix won't let us run w. omega_h on weaver since we're still on cuda 11.2

@ikalash
Copy link
Collaborator Author

ikalash commented Nov 8, 2023

@jewatkins : you are right. Good call.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CUDA Omega_h Topics related to the Omega_h mesh library Testing Stuff related to testing Albany (including nightly tests)
Projects
None yet
Development

No branches or pull requests

5 participants