need command line utility for testing device memory access #232

carns · 2023-01-11T13:46:51Z

We recently added the margo-info utility to help diagnose network transport support in Margo builds. It would be helpful to have a similar utility that can diagnose ability to access device (i.e. accelerator) memory as well.

CUDA support would be the first target, to confirm if CUDA memory access works with various provider/build combinations. This can be validated without performing any network communication, so a single process utility would be sufficient. It just needs to attempt to create a bulk handle for a CUDA memory region.

See mochi-hpc/mochi-thallium#7 for an example of the kind of scenario that we would like to diagnose.

TBD if we can make a utility like this generic enough to be useful. One challenge is that you cannot allocate or reference CUDA memory without making CUDA calls, which means that this hypothetical utility would have a CUDA dependency, probably both for runtime library and compiler.

The text was updated successfully, but these errors were encountered:

mdorier · 2023-01-11T13:59:21Z

Maybe as a tool outside of margo, which would depend on both margo and CUDA?

carns · 2023-01-11T14:01:04Z

Yeah that probably makes sense to avoid clutter in the Margo build.

carns · 2023-02-10T16:15:58Z

See ofiwg/libfabric#8444 for a possible failure mode to look for; Verbs+CUDA doesn't work unless you are using the MOFED software stack, but there is no clear error message indicating this.

mdorier assigned carns Feb 28, 2023

mdorier added the enhancement New feature or request label Feb 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

need command line utility for testing device memory access #232

need command line utility for testing device memory access #232

carns commented Jan 11, 2023

mdorier commented Jan 11, 2023

carns commented Jan 11, 2023

carns commented Feb 10, 2023

need command line utility for testing device memory access #232

need command line utility for testing device memory access #232

Comments

carns commented Jan 11, 2023

mdorier commented Jan 11, 2023

carns commented Jan 11, 2023

carns commented Feb 10, 2023