-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft kokkos-comm initialization/finalization using MPI_Session
s
#68
base: develop
Are you sure you want to change the base?
Conversation
src/KokkosComm_communicator.hpp
Outdated
comm_kind = CommunicatorKind::User; | ||
} else { | ||
fprintf(stderr, "[KokkosComm] error: intercommunicators are not supported (yet).\n"); | ||
std::terminate(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Don't just terminate. Either throw an exception or call MPI_Abort
(I've seen applications hang where one process aborted without calling MPI_Abort
, it's nasty)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any specific error code you suggest?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really, anything non-zero would work. This ties in with #29 so anything you do here is probably temporary anyway :)
|
||
static auto dup(const Communicator &other) -> Communicator { return Communicator::dup_raw(other.as_raw()); } | ||
|
||
static auto split_raw(MPI_Comm raw, Color color, Key key) -> Communicator { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason why this isn't just an overload of split
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to make it explicit that we're splitting from a raw MPI_Comm
handle, not a wrapped KokkosComm::Communicator
, but I guess we can also simply overload.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess that's a design decision that the Kokkos community should make (to be consistent with the other projects)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can drop the Raw. We typically are happy with overloads.
src/KokkosComm_communicator.hpp
Outdated
inline static auto self(void) -> Communicator { return Communicator::from_raw_unchecked(MPI_COMM_SELF); } | ||
|
||
inline static auto world(void) -> Communicator { return Communicator::from_raw_unchecked(MPI_COMM_WORLD); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That doesn't belong into the communicator. Since we use sessions, there should be no use of the world process model. It could even be that no one has called MPI_Init
.
src/impl/KokkosComm_MPI_instance.hpp
Outdated
} | ||
|
||
template <KokkosExecutionSpace ExecSpace> | ||
auto initialize(int &argc, char *argv[]) -> Universe<ExecSpace> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not obvious that once version of initialize
uses the WPM and one uses the SPM. I'd rather stay away from the WPM tbh.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll remove the one using the WPM so that we only initialize using the SPM.
Thank you very much for the first review, @devreal! Regarding C++ destruction and the SPM finalization semantics: is |
Just a PSA: OMPI in combination with UCX won't support Sessions until the next major release of OMPI (6.0): open-mpi/ompi#12566 (comment) |
It looks like Linux's OpenMPI is too old to support @cwpearson Is there any way we can specifically install OpenMPI 5.x in the CI? The package from Ubuntu repos is simply not up-to-date. |
The standard says that the application is required to clean up its objects before
We had a similar issue in other projects and came down to a registry in which objects that needed destruction were referenced. There are two options they can get destroyed:
All of this must be thread-safe etc but communicators are heavy objects that are not created and destroyed regularly so I think the overhead involved in such a scheme is acceptable. |
Would you like to rebase this on top of current develop? |
Open MPI's session support has not been released yet. It will be included in the next major release (tbd when that comes out) |
This PR is a first attempt at initialization and finalization for
kokkos-comm
relying onMPI_Session
s.Some noteworthy additions brought by this PR:
Communicator
class that wraps theMPI_Comm
and a Kokkos execution spaceUniverse
class that holds the handle to the MPI sessions, as well as a session-associated communicator.I expect lots of changes in the API, which I find kind of clunky at the moment. Reviews and comments on how to improve are welcome.