-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
when is PMIx_Fence
required?
#511
Comments
I'm afraid you can't, which is why we recommend that you either (a) use a fence to ensure it gets there, or (b) include a
Not sure I understand your last use of "global", but the text is advising that you utilize a fence.
Not required, but advised. Let me try to provide a little more of an explanation. There are two ways you can retrieve data that was "put": (a) you can use a fence operation to circulate the data. This is the most common method and generally supported by all runtimes (b) you can avoid the fence operation and rely on the "direct modex" (DM) method for retrieving the data. In DM, the data is left on the local server where your app "put" it. When another process requests the data, the local server must determine the identity of the server that has the data (i.e., the server that is hosting the client process that "put" the data) and then request the data from that server. Since there is no sync'ing fence, that server must "hold" the request until the data has been committed to it before responding (or else you'd just get a "not found" right away). DM may not be implemented by all runtimes. It is known available in PRRTE and OMPI's runtime, and on Slurm for Being a point-to-point method, DM can scale poorly in certain applications - e.g., if every process needs the info from every other process. It tends to work well for sparsely connected applications where each proc only needs info from a small number of its peers, and also in apps that can largely execute asynchronously. HTH |
@rhc54 thanks for the detailed answer :-)
The last "global" refers to the global mode (as opposed to DM). My question here was to know if in this case the usage of a Assuming that DM is supported, I still have a question about that mode: |
Best answer is: it depends on the runtime you are operating under. If it doesn't support DM, then the fence is mandatory. If it does support DM, then it really is up to you to decide based on the needs and behavior of your app.
Yeah, I forgot for the moment when writing the DM description that you are using generic keys. DM won't work with those as there is no way to know who is going to publish them. So if you use generic keys, you are kinda forced to do a fence to share them. Sorry about the confusion - we don't usually encounter generic keys any more. I suppose someone could try to implement a broadcast-based DM method, but I don't know of anyone doing so - seems like it would be awfully inefficient, but I haven't given it a lot of thought. If your job isn't too large, you could do a Of course, you have to be in a runtime that supports those functions - PRRTE and OMPI's mpirun do, but I don't know about others (might, just don't know about it). |
Hi all, I am reaching out with a question about the requirements around
PMIx_Fence
.Context
For our application, we are looking at the case where processes generate globally unique keys. They are posted to other processes using
PMIx_Put
.On the other side, the processes requesting the value associated with the globally unique key have no knowledge of where the key is located, so they rely on
PMIX_RANK_UNDEF
.Clarification
For that specific case, the API documentation mentions a few things around the need of
PMIx_Fence
but the final answer is not clear to me. Here are the list of sections I have identified to relate to this use case and some associated questions:PMIx_Get
is intended to work:It's clear that the
PMIx_Get
will block until the key is available (or timeout). But it's unclear to me how can I guarantee that the data appears on the server?Here, I presume that "must be globally exchanged prior to retrieval" refers to the "global" method?
If so, does it mean that to have the data on the server, we are semantically required to call
PMIx_Fence
in our case?Thanks for your help in clarifying the semantics :-)
The text was updated successfully, but these errors were encountered: