Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide KokkosComm::initialize/finalize #88

Closed
wants to merge 13 commits into from
Closed
Show file tree
Hide file tree
Changes from 12 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
68 changes: 66 additions & 2 deletions docs/api/core.rst
Original file line number Diff line number Diff line change
@@ -1,12 +1,15 @@
Core
====
****

MPI API Support
===============

.. list-table:: MPI API Support
:widths: 40 30 15
:header-rows: 1

* - MPI
- ``KokkosComm::``
- KokkosComm
- ``Kokkos::View``
* - ``MPI_Send``
- ``send`` or ``send<CommMode::Standard>``
Expand All @@ -33,6 +36,67 @@ Core
- ``reduce``
- ✓


Initialization and finalization
-------------------------------

KokkosComm provides a unified interface for initializing and finalizing both Kokkos and MPI.

.. Attention:: It is strongly recommended to use KokkosComm's initialization and finalization functions instead of their respective Kokkos and MPI counterparts. However, users have two options for using KokkosComm:

1. Initialize/finalize KokkosComm using ``KokkosComm::{initialize,finalize}``. In this case, it is **forbidden** to call ``MPI_{Init,Init_thread,Finalize}`` and ``Kokkos::{initialize,finalize}`` (otherwise, the application will abort).
2. Initialize/finalize MPI and Kokkos manually through their respective interfaces. In this case, it is **forbidden** to call ``KokkosComm::{initialize,finalize}`` (otherwise, the application will abort).

.. cpp:enum-class:: KokkosComm::ThreadSupportLevel

A scoped enum to wrap the MPI thread support levels.

.. cpp:enumerator:: KokkosComm::ThreadSupportLevel::Single

Only one thread will execute.

.. cpp:enumerator:: KokkosComm::ThreadSupportLevel::Funneled

The process may be multi-threaded, but only the main thread will make MPI calls (all MPI calls are funneled to the main thread).

.. cpp:enumerator:: KokkosComm::ThreadSupportLevel::Serialized

The process may be multi-threaded, and multiple threads may make MPI calls, but only one at a time: MPI calls are not made concurrently from two distinct threads (all MPI calls are serialized).

.. cpp:enumerator:: KokkosComm::ThreadSupportLevel::Multiple

Multiple threads may call MPI, with no restrictions.


.. cpp:function:: void KokkosComm::initialize(int &argc, char *argv[], KokkosComm::ThreadSupportLevel required)
.. cpp:function:: void KokkosComm::initialize(int &argc, char *argv[])

Initializes the MPI execution environment with the required MPI thread level support (``MPI_THREAD_MULTIPLE`` if unspecified), and then initializes the Kokkos execution environment. This function also strips ``--kokkos-help`` flags to prevent Kokkos from printing them on all MPI ranks.

:param argc: Non-negative value representing the number of command-line arguments passed to the program.
:param argv: Pointer to the first element of an array of ``argc + 1`` pointers, of which the last one is null and the previous, if any, point to null-terminated multi-byte strings that represent the arguments passed to the program.
:param required: Level of desired MPI thread support.

**Requirements:**

* ``KokkosComm::initialize`` has the same combined requirements as ``MPI_{Init,Init_thread}`` and ``Kokkos::initialize``.
* ``KokkosComm::initialize`` must be called in place of ``MPI_Init`` and ``Kokkos::initialize``.
* User-initiated MPI objects cannot be constructed, and MPI functions cannot be called until after ``KokkosComm::initialize`` is called.
* User-initiated Kokkos objects cannot be constructed until after ``KokkosComm::initialize`` is called.

.. cpp:function:: void KokkosComm::finalize()

Terminates the Kokkos and MPI execution environments.

Programs are ill-formed if they do not call this function *after* calling ``KokkosComm::initialize``.

**Requirements:**

* ``KokkosComm::finalize`` has the same combined requirements as ``MPI_Finalize`` and ``Kokkos::finalize``.
* ``KokkosComm::finalize`` must be called in place of ``MPI_Finalize`` and ``Kokkos::finalize``.
* ``KokkosComm::finalize`` must be called after user-initialized Kokkos objects are out of scope.


Point-to-point
--------------

Expand Down
15 changes: 6 additions & 9 deletions perf_tests/test_main.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@
//@HEADER

#include "KokkosComm_include_mpi.hpp"
#include "KokkosComm.hpp"

#include <Kokkos_Core.hpp>
#include <benchmark/benchmark.h>
Expand All @@ -32,15 +33,12 @@ class NullReporter : public ::benchmark::BenchmarkReporter {
// The main is rewritten to allow for MPI initializing and for selecting a
// reporter according to the process rank
int main(int argc, char **argv) {
MPI_Init(&argc, &argv);

int rank;
MPI_Comm_rank(MPI_COMM_WORLD, &rank);

Kokkos::initialize();
KokkosComm::initialize(argc, argv);

::benchmark::Initialize(&argc, argv);

int rank;
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
if (rank == 0)
// root process will use a reporter from the usual set provided by
// ::benchmark
Expand All @@ -51,7 +49,6 @@ int main(int argc, char **argv) {
::benchmark::RunSpecifiedBenchmarks(&null);
}

Kokkos::finalize();
MPI_Finalize();
KokkosComm::finalize();
return 0;
}
}
64 changes: 64 additions & 0 deletions src/KokkosComm.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@

#pragma once

#include "KokkosComm_include_mpi.hpp"
#include "KokkosComm_collective.hpp"
#include "KokkosComm_version.hpp"
#include "KokkosComm_isend.hpp"
Expand All @@ -28,8 +29,71 @@

#include <Kokkos_Core.hpp>

#include <algorithm>
#include <cstdio>
#include <string_view>

namespace KokkosComm {

enum class ThreadSupportLevel {
Single = MPI_THREAD_SINGLE,
Funneled = MPI_THREAD_FUNNELED,
Serialized = MPI_THREAD_SERIALIZED,
Multiple = MPI_THREAD_MULTIPLE,
};

inline void initialize(int &argc, char *argv[], ThreadSupportLevel required) {
int flag;
MPI_Initialized(&flag);
// Forbid calling this function if MPI has already been initialized
if (0 != flag) {
MPI_Abort(MPI_COMM_WORLD, -1);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not allow MPI to be initialized already as long as the requested thread-level matches?

}

// Forbid calling this function if Kokkos has already been initialized
if (Kokkos::is_initialized()) {
std::abort();
}
Comment on lines +53 to +56
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need this, Kokkos::initialize() will do that for you.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I understand. We want to abort if Kokkos::initialize has already been called. Why try to initialize MPI and Kokkos if we can eagerly prove we got illegally called?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because Kokkos will do that for you, I believe. I'm not sure MPI will.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I find it weird that a subproject subsumes the initialization of the main project. I should be able to initialize main Kokkos and KokkosComm from my application. The KokkosComm initializer should initialize main Kokkos if it's not already initialized and otherwise handle that gracefully. Same with MPI above.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand your words about "handle that gracefully". Kokkos initialization will abort if it has been initialized/finalized previously:
https://github.com/kokkos/kokkos/blob/892e13c8c49fa9d4ef9f8dcb69f90d525a6baa58/core/src/impl/Kokkos_Core.cpp#L1056-L1061.
The code inside KokkosComm::initialize() checking Kokkos initialization/finalization does not add anything, unless you decide to skip initializing Kokkos, which I argue is a wrong thing to do.

Copy link
Contributor

@aprokop aprokop Jun 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kokkos::initialize does some special checks for MPI

Are you sure about this? I only see detection of the local MPI rank based on environment variables, but not calls to any MPI routines.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kokkos docs mention one should call MPI_Init before Kokkos::initialize: https://kokkos.org/kokkos-core-wiki/ProgrammingGuide/Initialization.html?highlight=initialize#initialization

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do not prevent users from doing things manually.

You are telling people not to initialize your project if they initialize its dependencies manually. That is flawed.

However, the herein proposed API provides a straightforward and foolproof way of initializing/finalizing their KokkosComm application.

I am not arguing to take that away. If users want to shift all initialization to KokkosComm they can do so.

Moreover, I don’t understand why we would let people write fundamentally wrong code and silence it.

Why is that fundamentally wrong? You can still bail if they did it wrong but there is no reason to always bail even if the configuration they chose is correct (e.g., MPI was initialized with the same thread-level.)

Finally, I'd like to point out that this is only a first step towards KokkosComm initialization/finalization. Once #68 gets merged, we will have an isolated environment (thanks to MPI Sessions) that lets us lift most of — if not all — the constraints required for now.

Then let's not tell them one thing now and another then. We can get close to the semantics we will have then and users will not have to change their code. Keep in mind that what you are proposing now forces users to change their code such that later when KokkosComm uses Sessions the WPM will be broken and no communication outside of KokkosComm will work anymore. Or KokkosComm will be broken because users stuck to manually initializing MPI to get the WPM. We can do better than that.

This PR does not have the state, only free functions. Introducing the state has its own challenges, I think.

The approach I would use is an inline accessor to avoid explicit global variables that need to go into a compilation unit. In some header file:

inline bool& get_state_ref() {
  static bool state = false;
  return state;
}

Copy link
Collaborator Author

@dssgabriel dssgabriel Jun 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR aimed at unifying two calls — MPI_{Init,Init_thread} and Kokkos::initialize — into a single one.

You are telling people not to initialize your project if they initialize its dependencies manually. That is flawed.

It means that instead of having two calls, users may now need a third one.

Why is that fundamentally wrong? You can still bail if they did it wrong but there is no reason to always bail even if the configuration they chose is correct (e.g., MPI was initialized with the same thread-level.)

OK, that's fair.
Once MPI has been initialized, is it possible to retrieve the current thread support level? I can't find anything in OpenMPI's docs, but this is needed to check that the users' requirements are met in KokkosComm::initialize and error out if needed, as you've suggested.

Keep in mind that what you are proposing now forces users to change their code such that later when KokkosComm uses Sessions the WPM will be broken and no communication outside of KokkosComm will work anymore.

Once #68 gets merged, existing code will most likely have to change anyway. E.g. all of our tests use MPI_COMM_WORLD, which won't be valid in the SPM. Besides, we will want to provide some object (KokkosComm::Context or Handle) that wraps the MPI_Comm, Kokkos::ExecutionSpace, and maybe the associated stream if comms happen on a device. Our API will then use this object instead of a raw MPI_Comm.
All of our current interfaces are still subject to change. IMO it's ok if we tweak the semantics for now, although I agree it's best to get this PR closer to what we're aiming for in the future. 👍

This PR does not have the state, only free functions. Introducing the state has its own challenges, I think.

The approach I would use is an inline accessor to avoid explicit global variables that need to go into a compilation unit.

I don't think we need to carry any state for the moment, we're only calling MPI_Init_thread and Kokkos::initialize. Adding KokkosComm::is_{initiliazed,finalized} should be enough for now I think.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once MPI has been initialized, is it possible to retrieve the current thread support level?

The function you're looking for is MPI_QUERY_THREAD :)

It means that instead of having two calls, users may now need a third one.

I don't see the problem, esp if parts of my application rely on the WPM anyway and I have to ensure that the WPM is initialized. I cannot rely on KokkosComm to do that if I know that at some point the behavior will change.

Anyway, maybe this PR isn't ready yet and needs to be tied to #68. Let's discuss this during the Monday call, I will try to join (the MPI Forum potentially conflicts with the call).


int provided;
MPI_Init_thread(&argc, &argv, static_cast<int>(required), &provided);
// Abort if MPI failed to provide the required thread support level
if (static_cast<int>(required) < provided) {
MPI_Abort(MPI_COMM_WORLD, -1);
}

int rank;
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
// Strip "--help" and "--kokkos-help" from the flags passed to Kokkos if we are not on rank 0 to prevent Kokkos
// from printing the help message multiple times.
if (0 != rank) {
if (auto *help_it = std::find_if(argv, argv + argc, [](std::string_view const &x) { return x == "--kokkos-help"; });
help_it != argv + argc) {
std::swap(*help_it, *(argv + argc - 1));
--argc;
}
}
Kokkos::initialize(argc, argv);
}

inline void initialize(int &argc, char *argv[]) { initialize(argc, argv, ThreadSupportLevel::Multiple); }

inline void finalize() {
// Forbid calling this function if Kokkos has already been finalized or isn't yet initialized
if (Kokkos::is_finalized() || !Kokkos::is_initialized()) {
MPI_Abort(MPI_COMM_WORLD, -1);
}
Comment on lines +80 to +83
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't need it, Kokkos::finalize() will do that for you.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same but for finalization. Also, this lets us cleanly abort on all ranks.

Kokkos::finalize();

int flag;
MPI_Finalized(&flag);
// Forbid calling this function if MPI has already been finalized
if (0 != flag) {
std::abort();
}
MPI_Finalize();
}

template <CommMode SendMode = CommMode::Default, KokkosExecutionSpace ExecSpace, KokkosView SendView>
Req isend(const ExecSpace &space, const SendView &sv, int dest, int tag, MPI_Comm comm) {
return Impl::isend<SendMode>(space, sv, dest, tag, comm);
Expand Down
10 changes: 4 additions & 6 deletions unit_tests/test_main.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -81,7 +81,8 @@ int main(int argc, char *argv[]) {
// Intialize google test
::testing::InitGoogleTest(&argc, argv);

MPI_Init(&argc, &argv);
KokkosComm::initialize(argc, argv);

int rank, size;
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &size);
Expand All @@ -90,8 +91,6 @@ int main(int argc, char *argv[]) {
<< KOKKOSCOMM_VERSION_PATCH << ")\n";
}

Kokkos::initialize();

// Intialize google test
::testing::InitGoogleTest(&argc, argv);

Expand All @@ -105,9 +104,8 @@ int main(int argc, char *argv[]) {
// run tests
auto exit_code = RUN_ALL_TESTS();

// Finalize MPI before exiting
Kokkos::finalize();
MPI_Finalize();
// Finalize KokkosComm before exiting
KokkosComm::finalize();

return exit_code;
}
Loading