Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ch4/am: Caching buffer attribute in request and use typerep fast path for H2H #7082

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

yfguo
Copy link
Contributor

@yfguo yfguo commented Aug 1, 2024

Pull Request Description

This PR add three things

  1. A fast path in typerep copy/pack/unpack for H2H case. It bypasses pointer attribute query and related branches. This is enabled with new flag MPIR_TYPEREP_FLAG_H2H. SHM pipeline can benefit from this skipping a few memory access and branches.
  2. A cache for buffer attribute in request. AM recv now checks buffer attribute and save it in recv request. The recv datacopy will utilize this info and choose the typerep H2H path if possible.
  3. POSIX SHM iqueue also checks buffer attribute and utilize typerep H2H path.

Request for comments (@hzhou @raffenet ):

  1. It might be beneficial to generalize this and add H2D and D2H path, so we can save at least one buffer attribute check on typerep. For GPU pipeline it should be a good thing as different segment of the same buffer should always have the same attribute.
  2. Passing buffer info from POSIX to eager module is missing. I am adding new POSIX_AM_TYPE right now, which is not the best solution IMO.

Author Checklist

  • Provide Description
    Particularly focus on why, not what. Reference background, issues, test failures, xfail entries, etc.
  • Commits Follow Good Practice
    Commits are self-contained and do not do two things at once.
    Commit message is of the form: module: short description
    Commit message explains what's in the commit.
  • Passes All Tests
    Whitespace checker. Warnings test. Additional tests via comments.
  • Contribution Agreement
    For non-Argonne authors, check contribution agreement.
    If necessary, request an explicit comment from your companies PR approval manager.

yfguo added 4 commits August 1, 2024 13:04
Add new flag MPIR_TYPEREP_FLAG_H2H can be passed to choose this path.
If typerep user knows the buffers are all host, they can using this
to bypass buffer attribute checking and related branches.

This can be useful for SHM pipelined transfer where send/recv buffer
attribute can be cached in request.
The checking on whether a buffer is on host can be simplified to one
check because the enum of unregistered host and registered host can
be combined.
We check and cache the recv buffer attribute when posting recv request.
This info is used to use the H2H fast path in typerep.
@yfguo
Copy link
Contributor Author

yfguo commented Aug 1, 2024

test:mpich/ch4/ofi

@yfguo
Copy link
Contributor Author

yfguo commented Aug 1, 2024

Here is the context of this PR.

I was testing the performance of fbox and found that building with yaksa vs dataloop has a 0.02-0.03 us difference in latency for 1B message. Since no derived datatype is involved, the data copying should mostly go through the memcpy path in the typerep_copy/pack. The difference between typerep_yaksa and typerep_dataloop is the checks on buffer pointer attributes.

I have not check GPU build yet. But I think it would be a good thing that we cache these info instead of checking them every time.

@yfguo yfguo marked this pull request as draft August 1, 2024 19:28
Check if source buffer is on host and choose typerep fast path for H2H.
@yfguo yfguo force-pushed the exp-am-cache-ptr-attr branch from 2d50266 to c4d1e02 Compare August 1, 2024 19:29
@yfguo
Copy link
Contributor Author

yfguo commented Aug 1, 2024

test:mpich/ch4/ofi

@@ -209,6 +210,7 @@ typedef struct MPIDIG_req_t {
void *buffer;
MPI_Aint count;
MPI_Datatype datatype;
MPL_pointer_attr_t buf_attr;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This causes the compile-time assertion about the size of the extended request object to fail in Jenkins tests. Do we need the whole attribute struct or can we get by with just the type?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants