Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

need command line utility for testing device memory access #232

Open
carns opened this issue Jan 11, 2023 · 3 comments
Open

need command line utility for testing device memory access #232

carns opened this issue Jan 11, 2023 · 3 comments
Assignees
Labels
enhancement New feature or request

Comments

@carns
Copy link
Member

carns commented Jan 11, 2023

We recently added the margo-info utility to help diagnose network transport support in Margo builds. It would be helpful to have a similar utility that can diagnose ability to access device (i.e. accelerator) memory as well.

CUDA support would be the first target, to confirm if CUDA memory access works with various provider/build combinations. This can be validated without performing any network communication, so a single process utility would be sufficient. It just needs to attempt to create a bulk handle for a CUDA memory region.

See mochi-hpc/mochi-thallium#7 for an example of the kind of scenario that we would like to diagnose.

TBD if we can make a utility like this generic enough to be useful. One challenge is that you cannot allocate or reference CUDA memory without making CUDA calls, which means that this hypothetical utility would have a CUDA dependency, probably both for runtime library and compiler.

@mdorier
Copy link
Contributor

mdorier commented Jan 11, 2023

Maybe as a tool outside of margo, which would depend on both margo and CUDA?

@carns
Copy link
Member Author

carns commented Jan 11, 2023

Yeah that probably makes sense to avoid clutter in the Margo build.

@carns
Copy link
Member Author

carns commented Feb 10, 2023

See ofiwg/libfabric#8444 for a possible failure mode to look for; Verbs+CUDA doesn't work unless you are using the MOFED software stack, but there is no clear error message indicating this.

@mdorier mdorier added the enhancement New feature or request label Feb 28, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants