Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add dev container configuration #112

Merged
merged 10 commits into from
Aug 26, 2024
Merged

Conversation

BenWibking
Copy link
Contributor

@BenWibking BenWibking commented Aug 22, 2024

I've added a .devcontainer subdirectory in the repo that enables GitHub Codespaces and VSCode Dev Containers. It will use the CUDA CI container image and re-open the repository in that container.

This should make reproducibility easy, plus make it easier to new contributors to get started. New dependences, e.g., for openPMD, can also be added.

@BenWibking BenWibking added the enhancement New feature or request label Aug 22, 2024
@BenWibking BenWibking requested a review from pgrete August 22, 2024 19:12
Copy link
Contributor

@pgrete pgrete left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. I have never used this but see the value in it.
Would you mind adding a minimal "usage" guide to docs/development.md (and a Changelog entry)?

@BenWibking
Copy link
Contributor Author

@pgrete What is the username you use in the CI container? I tried ubuntu, but that didn't work:

Status: Downloaded newer image for ghcr.io/parthenon-hpc-lab/cuda11.6-mpi-hdf5-ascent:latest
Container started
Shell server terminated (code: 126, signal: null)
unable to find user ubuntu: no matching entries in passwd file

@pgrete
Copy link
Contributor

pgrete commented Aug 23, 2024

@pgrete What is the username you use in the CI container? I tried ubuntu, but that didn't work:

Status: Downloaded newer image for ghcr.io/parthenon-hpc-lab/cuda11.6-mpi-hdf5-ascent:latest
Container started
Shell server terminated (code: 126, signal: null)
unable to find user ubuntu: no matching entries in passwd file

Yes, I should probably update those at some point.
FWIW the docker script for the CI containers are here https://github.com/parthenon-hpc-lab/parthenon/tree/develop/scripts/docker

@BenWibking
Copy link
Contributor Author

@pgrete I can't build in the CI container, because it can't find HDF5. Since the CI works, it has to be there, right? Could you try opening the container and see if you can build?

[cmake] -- Could NOT find HDF5 (missing: HDF5_LIBRARIES HDF5_INCLUDE_DIRS C) (found version "")
[cmake] -- Configuring incomplete, errors occurred!
[cmake] See also "/workspaces/athenapk/build/CMakeFiles/CMakeOutput.log".
[cmake] See also "/workspaces/athenapk/build/CMakeFiles/CMakeError.log".
[cmake] CMake Error at external/parthenon/CMakeLists.txt:143 (message):
[cmake]   HDF5 is required but couldn't be found.  If you want to build Parthenon
[cmake]   without HDF5, please rerun CMake with -DPARTHENON_DISABLE_HDF5=ON

@BenWibking
Copy link
Contributor Author

Ah, I see why it can't find it. Would it make sense to use a newer container image, and install openmpi and hdf5 via the package manager instead?

@pgrete
Copy link
Contributor

pgrete commented Aug 23, 2024

Ah, I see why it can't find it. Would it make sense to use a newer container image, and install openmpi and hdf5 via the package manager instead?

If you find out a way install the serial and parallel version of hdf5 at the same time from the package manager, then yes. It's the only reason I do the manual install.

@BenWibking
Copy link
Contributor Author

Ah, I see why it can't find it. Would it make sense to use a newer container image, and install openmpi and hdf5 via the package manager instead?

If you find out a way install the serial and parallel version of hdf5 at the same time from the package manager, then yes. It's the only reason I do the manual install.

Why is it necessary to install both?

@pgrete
Copy link
Contributor

pgrete commented Aug 23, 2024

Ah, I see why it can't find it. Would it make sense to use a newer container image, and install openmpi and hdf5 via the package manager instead?

If you find out a way install the serial and parallel version of hdf5 at the same time from the package manager, then yes. It's the only reason I do the manual install.

Why is it necessary to install both?

Because we also test the serial/non-MPI version of Parthenon (and that requires the non-mpi HDf5 version).
Not sure anyone runs Parthenon without MPI, but who knows.

@BenWibking
Copy link
Contributor Author

When I try to run in the container, I get:

root@codespaces-00b841:/workspaces/athenapk/build/bin# /workspaces/athenapk/build/bin/athenaPK
--------------------------------------------------------------------------
The library attempted to open the following supporting CUDA libraries,
but each of them failed.  CUDA-aware support is disabled.
libcuda.so.1: cannot open shared object file: No such file or directory
libcuda.dylib: cannot open shared object file: No such file or directory
/usr/lib64/libcuda.so.1: cannot open shared object file: No such file or directory
/usr/lib64/libcuda.dylib: cannot open shared object file: No such file or directory
If you are not interested in CUDA-aware support, then run with
--mca opal_warn_on_missing_libcuda 0 to suppress this message.  If you are interested
in CUDA-aware support, then try setting LD_LIBRARY_PATH to the location
of libcuda.so.1 to get passed this issue.
--------------------------------------------------------------------------
### FATAL ERROR in main
No input file or restart file is specified.

Is this expected?

@BenWibking
Copy link
Contributor Author

Ok, I've tested it on Codespaces. Let me know if everything works for you. Otherwise, it should be ready to go.

@BenWibking BenWibking enabled auto-merge August 23, 2024 14:27
@pgrete
Copy link
Contributor

pgrete commented Aug 23, 2024

When I try to run in the container, I get:

root@codespaces-00b841:/workspaces/athenapk/build/bin# /workspaces/athenapk/build/bin/athenaPK
--------------------------------------------------------------------------
The library attempted to open the following supporting CUDA libraries,
but each of them failed.  CUDA-aware support is disabled.
libcuda.so.1: cannot open shared object file: No such file or directory
libcuda.dylib: cannot open shared object file: No such file or directory
/usr/lib64/libcuda.so.1: cannot open shared object file: No such file or directory
/usr/lib64/libcuda.dylib: cannot open shared object file: No such file or directory
If you are not interested in CUDA-aware support, then run with
--mca opal_warn_on_missing_libcuda 0 to suppress this message.  If you are interested
in CUDA-aware support, then try setting LD_LIBRARY_PATH to the location
of libcuda.so.1 to get passed this issue.
--------------------------------------------------------------------------
### FATAL ERROR in main
No input file or restart file is specified.

Is this expected?

Are you running in a "cuda-aware" docker container? Might be looking for the forwarded cuda libs from the host.

@BenWibking
Copy link
Contributor Author

No, I'm running on GitHub codespaces, which does not have GPUs. This suppresses the warning: https://github.com/parthenon-hpc-lab/athenapk/pull/112/files#diff-24ad71c8613ddcf6fd23818cb3bb477a1fb6d83af4550b0bad43099813088686R32

@BenWibking
Copy link
Contributor Author

You can run it like this:
image

Copy link
Contributor

@pgrete pgrete left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fixes. I just confirmed that this works on my end as expected.

@BenWibking BenWibking merged commit 87429d2 into main Aug 26, 2024
4 checks passed
@BenWibking BenWibking deleted the BenWibking/add-devcontainer branch August 26, 2024 11:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants