btl/uct: add support for using an another memory domain to form connections #12822
+475
−237
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
The UCT BTL looks for a connect-to-iface interface in each memory domain to form connections for connect-to-endpoint transports. For example, with ib the btl will pick the UD transport as the means to setup RC. While there are connection transports available (RDMACM) I chose using UD (etc) to support networks that did not necessarily provide a connection transport.
I am currently working with improving support for Open MPI on a RoCEv2 system that does not provide support for UD (yet). This breaks the assumption that there will always be a connect-to-ifact transport available in all memory domains. To fix this issue this change updates the detection logic to locate a suitable transport for making connections (tcp by default). If a memory domain does not have a suitable connection transport the alternate will be used instead. This has been tested on our broken-UD system and works well.
It a connection-only transport is not needed the extra transport module is destroyed and the in-memory domain connection transport is used.