Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature request: adopt Mercury event interface in Margo progress engine #288

Closed
4 of 7 tasks
carns opened this issue Sep 3, 2024 · 2 comments
Closed
4 of 7 tasks
Assignees

Comments

@carns
Copy link
Member

carns commented Sep 3, 2024

Mercury 3.4.0 will include a new HG_event_ API that uses file descriptor notifications as an alternative to blocking within a traditional progress function when waiting for network activity. See https://github.com/mercury-hpc/mercury/blob/bfd17b6aa64df4edcd789a8cdbb50b1327400d25/src/mercury.h#L1096

Using this in Margo would have several benefits:

  • avoid busy spinning in cases where we need to multiplex activity
  • make it possible to drive multiple interfaces with one OS thread
  • lower cpu consumption
  • combine the following into a simple single event path with a uniform blocking and handling mechanism:
    • network (Mercury) events
    • Argobots (ULT) events
    • timer events

The best way to do this would be with a custom Argobots pool implementation (so that we can also wake up the scheduler efficiently when new work units are created as a result of network activity). We can do a proof of concept implementation using the "PRIO_WAIT" pool which is already in the Margo source tree and should have similar functional semantics to our usual default "FIFO_WAIT" pool that resides in the Argobots source tree.

The proof of concept tasks are:

  • switch to PRIO_WAIT as default pool for now so that we can run the full make check test suite against all dev work
  • add an interface to expose an eventfd() out of the prio_wait pool config api, and make the pool use it in place of a pthread condition variable when activated (to avoid potential duplicate signal paths)
  • switch the margo timers to use timerfd (see timerfd_create())
  • adopt mercury event interface
  • refactor event loop in margo to leverage the above
  • performance evaluation using quintain
  • review and consider how to incorporate upstream
@carns carns self-assigned this Sep 3, 2024
@carns
Copy link
Member Author

carns commented Sep 5, 2024

Additional possibilities that have come up as part of this investigation:

  • making it possible to run the full suite of regression tests against any pool/scheduler combination (right now there are implicit assumptions that the default scheduler and pool will be used)
  • cutting deprecated code from the in-tree pool implementations (this would be good practice even if the event model doesn't work out)
  • considering how to fall back to the old progress engine code if Mercury is using a transport that doesn't support file descriptor events (the internal psm/psm2 nas are probably the main relevant examples) or for systems that don't have eventfd or epoll
  • existing wait-capable pool implementations (both in our tree and in the argobots tree) are likely calling pthread_cond_signal() far more than necessary; this is only needed in the transition from idle to non-idle

@carns
Copy link
Member Author

carns commented Sep 17, 2024

See #290 for reference in experimental implementation. Closing issue for now. We can incorporate the HG event interface at some point as a drop-in replacement for our current logic, but the attempt to use it as a basis to refactor event management was unsuccessful (it worked correctly functionally but caused too much performance overhead).

@carns carns closed this as completed Sep 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant