WIP: automatic bulk handle #287

mdorier · 2024-09-03T15:43:04Z

Most often when using bulk operations, the code looks as follows:

The client wraps its localmemory into an hg_bulk_t;
The client sends that hg_bulk_t to a server;
The server allocates a local buffer;
The server wraps the local buffer into its own hg_bulk_t;
The server issues margo_bulk_transfer operations between the two hg_bulk_t.

The feature proposed by this PR, called margo_auto_bulk_t, is a way to simplify this. With this feature, the above pattern changes into the following:

The client wraps its local memory into a margo_auto_bulk_t;
The client sends that margo_auto_bulk_t to a server;
The server can immediately call margo_auto_bulk_access to get a pointer to a local buffer (allocated by Margo) mirroring the client's buffer. If the client has specified MARGO_PULL_ON_ACCESS as a flag, the first call to margo_auto_bulk_access will pull the content of the client's memory into the server's.
The server can use margo_auto_bulk_pull to pull data from the client's memory to the server's local buffer, and margo_auto_bulk_push to push data from the server's local buffer to the client's memory;
If the client has specified MARGO_PUSH_ON_DESTROY as a flag, margo_auto_bulk_push is implicitly called when the margo_auto_bulk_t is destroyed.

In most cases, margo_auto_bulk_push and margo_auto_bulk_pull are not needed, and transfers will happen automatically on margo_auto_bulk_access and/or when calling margo_free_input on an RPC input that contains a margo_auto_bulk_t.

Important to note is the fact that the margo_auto_bulk_t will optimize the whole thing when client and server live in the same process: margo_auto_bulk_access will simply provide the server with the address of the buffer provided by the client, rather than do an extra allocation, and all the copies will be bypassed.

@carns Before I implement this feature, do you have any comment?

codecov · 2024-09-03T15:51:48Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 57.65%. Comparing base (e973cd6) to head (733ca89).

Additional details and impacted files

@@           Coverage Diff           @@
##             main     #287   +/-   ##
=======================================
  Coverage   57.65%   57.65%           
=======================================
  Files          70       70           
  Lines       10146    10146           
  Branches     1335     1335           
=======================================
  Hits         5850     5850           
  Misses       3462     3462           
  Partials      834      834

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

carns · 2024-09-24T17:45:55Z

This is a little bit more complicated than I expected. I'm trying to think about what things could maybe make it simpler.

From the use case description, I think the biggest difference between the conventional pattern and the proposed pattern is on the server side.

Could we accomplish something similar without adding a new bulk type, but by instead by just adding two convenience functions on the server, something like margo_auto_bulk_create() and margo_auto_bulk_destroy()? The former would create a local hg_bulk_t, allocate memory for it, and if a PULL flag is set do the bulk_transfer, all in one call from an API point of view. The latter would (if a PUSH flag is set) do a bulk transfer, destroy the hg_bulk_t, and free the memory.

I could see that being helpful to hide the transfer and allocation steps. I'm not sure that we need a new type or new push or pull functions for it, though? The underlying functions are available for anyone doing more sophisticated patterns.

mdorier · 2024-09-24T17:53:42Z

In your scenario, you still need a new bulk type wrapping the local bulk + local buffer, don't you? Then the only difference between my API and yours is that (1) your solution doesn't involve creating an auto-bulk on the client side, and (2) my solution does the transfer when first calling margo_auto_bulk_access (because the auto_bulk will have already been created via deserialization), while in your API it's done when creating the auto-bulk?

carns · 2024-09-24T18:16:47Z

margo_bulk_mirror_create() and margo_bulk_mirror_free() seem like better names after offline discussion, just recording here so we don't forget.

The create() can provide both hg_bulk_t* and a void* output arguments, which may refer to new allocations or may reference the peer's memory if the peer is local.

Still under discussion: how (and should we) handle different allocation modes, like being able to use existing memory, or being able to use poolsets, or being able to use a provider-specific allocator?

mdorier added 2 commits September 3, 2024 16:27

Added header file for auto-bulk feature

4ce11dd

added missing margo_auto_bulk_free and added arguments to push and pull

733ca89

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

WIP: automatic bulk handle #287

WIP: automatic bulk handle #287

mdorier commented Sep 3, 2024

codecov bot commented Sep 3, 2024

carns commented Sep 24, 2024

mdorier commented Sep 24, 2024

carns commented Sep 24, 2024 •

edited

Loading

WIP: automatic bulk handle #287

Are you sure you want to change the base?

WIP: automatic bulk handle #287

Conversation

mdorier commented Sep 3, 2024

codecov bot commented Sep 3, 2024

Codecov Report

carns commented Sep 24, 2024

mdorier commented Sep 24, 2024

carns commented Sep 24, 2024 • edited Loading

carns commented Sep 24, 2024 •

edited

Loading