Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zero overhead low level api #50

Closed
wants to merge 41 commits into from

Conversation

dutkalex
Copy link
Collaborator

@dutkalex dutkalex commented May 8, 2024

This PR proposes a design which we've had some success with in our in-house solution. This does not have to be integrated as is, the goal is rather to initiate a discussion on the design aspects. I believe this design enables to support the current type-erased solution implemented in the Req class, but is also compatible with other semantics/tradeoffs

@dutkalex dutkalex marked this pull request as draft May 8, 2024 19:11
@dutkalex dutkalex marked this pull request as ready for review May 8, 2024 19:57
@dutkalex dutkalex changed the title Communicator class Design considerations May 9, 2024
@dutkalex
Copy link
Collaborator Author

dutkalex commented May 9, 2024

Different problems requires different communication strategies. Therefore, using KokkosComm should not force the user to use a suboptimal communication mechanism with regards to the problem at hand. This PR proposes a first draft of a flexible enough design (IMO at least, please let me know if I'm missing something), which decouples the wrapping of the raw MPI functionalities from the higher-level communication scheme. I have laid out the outline of 2 very different schemes both relying on the same low level wrappers to demonstrate my point: the current type-erased solution, and a more static solution (roughly mirroring what I have developed for our in-house CFD solver) which is better suited for reccurent communication patterns which are known ahead of time

@devreal
Copy link
Collaborator

devreal commented May 10, 2024

In general: what is the benefit of having two ways of doing the same thing? There are free-standing functions and this introduces member functions that (ideally) do the same. Can you explain a little more what the upside of member functions is?

@dutkalex
Copy link
Collaborator Author

dutkalex commented May 10, 2024

@devreal What I advocate for is not having two ways of doing the same thing. Once again this is not meant to be merged as is, and the only reason why I did not use the already implemented internals is because of their tight coupling with the type erased Req class.

The first idea I advocate for is to decouple the wrapping of the raw MPI primitives from the communication scheme considerations. The current implementation forces the user to use the type-erased solution, which comes with some overhead, and which is not the best solution for all problems. I would even argue that this would be a strong argument against using KokkosComm if having memory allocations all over the place is not an option. Therefore, I think it makes sense to first introduce basic idiomatic C++ wrappers to ease manipulation of the raw MPI primitives, and then built on top of that to propose different communication strategies with different tradeoffs, or even let the user implement a custom one using the lower-level wrappings.

The second idea I advocate for is that a member function API would be better than the current free function implementation for the following reasons:

  • No weakly-type MPI_Comm object leaks into the client code with this approach
  • All MPI calls require a MPI_Comm object, so it makes sense to implement these as member functions IMO
  • With this approach, no KokkosComm:: prefix is needed which makes the syntax simpler
  • The choice of a communication mode is often not specific to a single MPI call. Therefore, although it is not demonstrated in the PR code, the Communicator abstraction can be used to store this kind of general parameters. This also contributes to having a simpler syntax for most MPI calls: auto req = comm.irecv( view, from_rank, tag ); is more elegant and user-friendly than the more verbose auto req = KokkosComm::irecv< RecvMode >( view, from_rank, tag, MPI_COMM_WORLD );, especially when doing many communication calls
  • Building on top of the previous argument, it is also easy to decouple construction from behavior with this approach. For example, I can create a communicator object with some choices of communication modes, and then hand this to my solver which does not have to care about these considerations which are encapsulated in the communicator object. With this approach it becomes really easy to enforce a coherent application-wide configuration

@dutkalex
Copy link
Collaborator Author

  • The choice of a communication mode is often not specific to a single MPI call. Therefore, although it is not demonstrated in the PR code, the Communicator abstraction can be used to store this kind of general parameters.

I have made the necessary changes to illustrate the potential of the approach. Also tagging @cwpearson for reference

@devreal
Copy link
Collaborator

devreal commented May 10, 2024

This PR proposes a first draft of a flexible enough design (IMO at least, please let me know if I'm missing something), which decouples the wrapping of the raw MPI functionalities from the higher-level communication scheme

Do you mean "decoupling communication and resource management"? We could have a way to allow users to query the requirements for a given view to manage resources explicitly, both for datatypes and temporary buffers:

// query whether we need a temporary buffer or a derived datatype
std::variant<size_t, KokkosComm::Type> opt = KokkosComm::query(view);
if (std::holds_alternative<size_t>(opt)) {
  // provide a temporary buffer for packing, takes ownership until completion
  ensure_size(std::get<size_t>(opt), my_properly_sized_buf);
  KokkosComm::isend(view, comm, ..., my_properly_sized_buf);
} else {
  // provides a datatype we have cached (could be built-in for contiguous views)
  KokkosComm::isend(view, comm, ..., std::get<KokkosComm::Type>(opt));
}

// no cached information for another_view, let kokkos figure it out
KokkosComm::isend(another_view, comm, ...,);

@dutkalex
Copy link
Collaborator Author

dutkalex commented May 11, 2024

Do you mean "decoupling communication and resource management"?

Yes @devreal ! We should have a zero-overhead abstraction layer on top of MPI which simply does the encapsulation with the same semantics, but exposing a nicer and more kokkos-friendly API, and then build on top of that all the additional fancy functionalities, rather than have do-it-all wrappers as is currently the case...

We could have a way to allow users to query the requirements for a given view to manage resources explicitly, both for datatypes and temporary buffers

This seems reasonable to me, although I dont quite get where you get your my_properly_sized_buf object from in your example...

@dutkalex dutkalex requested a review from devreal May 11, 2024 14:03
@dutkalex
Copy link
Collaborator Author

dutkalex commented May 11, 2024

@devreal considering your example, an interesting approach could be to have the communicator.member_fcn() API have the same semantics as the raw MPI call and be just syntactic sugar basically, and have the free functions implement the more elaborate semantics (as in the last line of your example copied below) using the member functions API. What do you think of it?

// no cached information for another_view, let kokkos figure it out
KokkosComm::isend(another_view, comm, ...,);

@dssgabriel
Copy link
Collaborator

dssgabriel commented May 13, 2024

  • No weakly-type MPI_Comm object leaks into the client code with this approach

I like this approach. We had already considered adding a Communicator type that would wrap the underlying MPI_Comm and the Kokkos execution space for performance-related reasons. It would allow us to bypass some MPI assertions (e.g., checking pointer provenance in a GPU-only context is expensive and useless as Kokkos can already statically prove the view is in GPU memory).

  • With this approach, no KokkosComm:: prefix is needed which makes the syntax simpler
  • All MPI calls require a MPI_Comm object, so it makes sense to implement these as member functions IMO

I think this is indeed an elegant solution that reduces the verbosity of the code, and it's also the approach taken in other languages implementing MPI bindings (e.g., see rsmpi).

This effort needs a few more iterations, but we should consider it.

@dutkalex dutkalex changed the title Design considerations zero overhead low level api May 14, 2024
@cwpearson
Copy link
Collaborator

cwpearson commented May 14, 2024

some unit tests please!

@dutkalex
Copy link
Collaborator Author

dutkalex commented May 14, 2024

some unit tests please!

Do you prefer to test by source file or by feature/overload set? (I would go for the latter)

@dutkalex dutkalex requested a review from cwpearson May 15, 2024 12:51
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a good reason to consolidate a bunch of things into a single header here. Let's split it up into

  1. The special handling around #include <mpi.h>
  2. KokkosComm_config.h (version-related stuff, a future home for other config things)
  3. the mpi_type thing
  4. The communicator class

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What do you mean by the version-related stuff?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see a good reason to consolidate a bunch of things into a single header here

The idea was to keep contained in a single place the raw MPI_*** primitives, and have all the other file use the type-safe API

unit_tests/test_barrier.cpp Outdated Show resolved Hide resolved
template <KokkosView SendView>
void send(const SendView &sv, int dest, int tag, MPI_Comm comm) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay to have the proposed interface (again, should call below with DefaultExecutionSpace() as first argument, but keep this one as well as a very low-level wrapper.

Copy link
Collaborator Author

@dutkalex dutkalex May 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about this the other way around: have the more complex APIs be implemented in terms of the low level ones

namespace KokkosComm::Impl {
template <KokkosExecutionSpace ExecSpace, KokkosView SendView, KokkosView RecvView>
void reduce(const ExecSpace &space, const SendView &sv, const RecvView &rv, MPI_Op op, int root, MPI_Comm comm) {
template <KokkosView SendView, KokkosView RecvView>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should just call the below with DefaultExecutionSpace() as the first argument.

Copy link
Collaborator

@cwpearson cwpearson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And at least a skeleton of documentation, and if the documentation is incomplete, open one or more issues against yourself 😄

@dutkalex
Copy link
Collaborator Author

dutkalex commented May 23, 2024

And at least a skeleton of documentation, and if the documentation is incomplete, open one or more issues against yourself 😄

@cwpearson Noted. Maybe I'll wait to be sure we are all on the same page before writing the doc though 😉

@dutkalex dutkalex closed this Sep 9, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants