Relax NumPy requirement in UCX #3731

jakirkham · 2020-04-20T20:18:13Z

This relaxes the NumPy requirement in UCX communication. We do this by leveraging things like struct.pack and struct.unpack for metadata about the frames, which gives us simple bytes objects to send over the wire. Also we create a host_array function that will use NumPy when available, but will fallback to things like bytearray if not.

While it works to have this be a single `int` (as it will be coerced to a `tuple`), go ahead and make it a `tuple` for clarity and to match more closely to the Numba case.

This is equivalent to using NumPy's `uint8`, but has the added benefit of not requiring NumPy be imported to work.

Matches the variable name in the `send` case to make things easier to follow.

As `struct.pack` and `struct.unpack` are able to build `bytes` objects containing the frame metadata needed by UCX easily, just use these functions instead of creating NumPy arrays each time. Helps soften the NumPy requirement a bit.

Matches more closely to the name used by RMM and Numba.

To relax the NumPy requirement completely, add a function to allocate arrays on host. If NumPy is not present, this falls back to just allocating `bytearray` objects, which work just as well.

quasiben · 2020-04-20T20:39:16Z

Is there any expected performance improvement by using struct vs numpy ?

cc @madsbk

jakirkham · 2020-04-20T21:21:46Z

Mostly hoping this simpler implementation avoids red herrings while debugging issues.

jakirkham · 2020-04-20T21:25:33Z

In terms of performance this does improve things a little bit, but it probably doesn't impact the overall send/receive process meaningfully (though I could be wrong about that). Again this isn't really the motivation for it.

In [1]: import struct                                                           

In [2]: import numpy                                                            

In [3]: l = 4 * [True, False]                                                   

In [4]: %timeit struct.pack(len(l) * "?", *l)                                   
389 ns ± 4.2 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

In [5]: %timeit numpy.array(l)                                                  
838 ns ± 2.67 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

madsbk

LGTM!

Avoids multiple calls to `len(frames)`, is a bit easier to read, and matches the receive code path more closely.

To send fewer and larger messages, pack both which frames are on device and how large each frame is into one message.

distributed/comm/ucx.py

quasiben · 2020-04-21T02:39:11Z

That's a good point.

…

On Mon, Apr 20, 2020, 10:14 PM jakirkham ***@***.***> wrote: ***@***.**** commented on this pull request. ------------------------------ In distributed/comm/ucx.py <#3731 (comment)>: > @@ -59,34 +60,42 @@ def init_once(): ucp.init(options=ucx_config, env_takes_precedence=True) + # Find the function, `host_array()`, to use when allocating new host arrays + try: That may be so. However that would be like saying NumPy can rely on Pandas dependencies for code that Pandas uses from NumPy. In general we can't expect that to hold and it's a bit fragile. We should either make NumPy a requirement here or add a fallback. Seems safer to add this fallback here and it doesn't cost us much. Plus it should save us suffering later should things change in UCX-Py and we forget about this line here 🙂 — You are receiving this because your review was requested. Reply to this email directly, view it on GitHub <#3731 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAKWW6CZFEHUQUSFF7Z3ILTRNT6RDANCNFSM4MMXVHHA> .

jakirkham · 2020-04-21T03:47:54Z

Went ahead and grabbed the header consolidation commit from PR ( #3732 ) and pushed it here. That doesn't really change how the frames are serialized, but it does allow us to send all of the per frame metadata in one message.

quasiben · 2020-04-21T15:03:06Z

test_ucx.py passed locally with one exception:

distributed/comm/tests/test_ucx.py::test_ucx_deserialize FAILED

though this is probably due to hostname issues.

While I don't think this changes the overall scheme described in the notes, lines like:

struct.pack(nframes * "?" + nframes * "Q", *cuda_frames, *sizes)

will be hard to read tomorrow. Would you mind adding a note or two describe what packing is doing ?

quasiben · 2020-04-21T20:26:19Z

Thank you @jakirkham for taking the time to add those comments

jakirkham · 2020-04-21T20:27:58Z

Thanks Ben! 😄

Yeah I've never gotten that test to run successfully. 😞 Probably we should figure out how the config issue should be fixed and talk to OPS so it can be integrated into their provisioning scripts.

Was just about to ask if those comments seemed reasonable. Happy to adjust if needed 🙂

TomAugspurger

LGTM. Feel free to merge when you're ready.

jakirkham · 2020-04-21T23:35:46Z

Thanks all! 😄

jakirkham added 7 commits April 20, 2020 13:22

Make device_array's shape a tuple

bc87075

While it works to have this be a single `int` (as it will be coerced to a `tuple`), go ahead and make it a `tuple` for clarity and to match more closely to the Numba case.

Use "u1" to specify uint8 typed arrays

8e1e1c0

This is equivalent to using NumPy's `uint8`, but has the added benefit of not requiring NumPy be imported to work.

Rename is_cudas to cuda_frames

4fc853f

Matches the variable name in the `send` case to make things easier to follow.

Use pack/unpack for UCX frame metadata

12f4d47

As `struct.pack` and `struct.unpack` are able to build `bytes` objects containing the frame metadata needed by UCX easily, just use these functions instead of creating NumPy arrays each time. Helps soften the NumPy requirement a bit.

Rename cuda_array to device_array

fe7018c

Matches more closely to the name used by RMM and Numba.

Create function to allocate arrays on host

78ba385

To relax the NumPy requirement completely, add a function to allocate arrays on host. If NumPy is not present, this falls back to just allocating `bytearray` objects, which work just as well.

Fix formatting with black

258bdca

jakirkham force-pushed the relax_numpy_req_ucx branch from 6e67585 to 258bdca Compare April 20, 2020 20:23

jakirkham requested a review from quasiben April 20, 2020 20:27

madsbk approved these changes Apr 20, 2020

View reviewed changes

jakirkham added 7 commits April 20, 2020 17:01

Define cuda_frames with other frame definitions

6aad1ae

Store nframes for simplicity

5981100

Avoids multiple calls to `len(frames)`, is a bit easier to read, and matches the receive code path more closely.

Collect sizes along with other frame info

b950a86

Use sizes to pick out non-trivial frames to send

249c84a

Simply call sum on sizes for bytes sent

4f1f493

Use host_array to make buffers to receive into

7b3cecd

Pack per frame metadata into one message

98d82dd

To send fewer and larger messages, pack both which frames are on device and how large each frame is into one message.

jakirkham mentioned this pull request Apr 21, 2020

Consolidate messages in UCX #3732

Open

quasiben reviewed Apr 21, 2020

View reviewed changes

distributed/comm/ucx.py Show resolved Hide resolved

jakirkham force-pushed the relax_numpy_req_ucx branch 2 times, most recently from 19aabd7 to 4cc672e Compare April 21, 2020 20:23

Note what struct lines are packing/unpacking

c59f95d

jakirkham force-pushed the relax_numpy_req_ucx branch from 4cc672e to c59f95d Compare April 21, 2020 20:24

quasiben approved these changes Apr 21, 2020

View reviewed changes

TomAugspurger approved these changes Apr 21, 2020

View reviewed changes

jakirkham merged commit 6db09f3 into dask:master Apr 21, 2020

jakirkham deleted the relax_numpy_req_ucx branch April 21, 2020 23:35

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Relax NumPy requirement in UCX #3731

Relax NumPy requirement in UCX #3731

jakirkham commented Apr 20, 2020

quasiben commented Apr 20, 2020

jakirkham commented Apr 20, 2020

jakirkham commented Apr 20, 2020

madsbk left a comment

quasiben commented Apr 21, 2020 via email

jakirkham commented Apr 21, 2020 •

edited

Loading

quasiben commented Apr 21, 2020

quasiben commented Apr 21, 2020

jakirkham commented Apr 21, 2020

TomAugspurger left a comment

jakirkham commented Apr 21, 2020

Relax NumPy requirement in UCX #3731

Relax NumPy requirement in UCX #3731

Conversation

jakirkham commented Apr 20, 2020

quasiben commented Apr 20, 2020

jakirkham commented Apr 20, 2020

jakirkham commented Apr 20, 2020

madsbk left a comment

Choose a reason for hiding this comment

quasiben commented Apr 21, 2020 via email

jakirkham commented Apr 21, 2020 • edited Loading

quasiben commented Apr 21, 2020

quasiben commented Apr 21, 2020

jakirkham commented Apr 21, 2020

TomAugspurger left a comment

Choose a reason for hiding this comment

jakirkham commented Apr 21, 2020

jakirkham commented Apr 21, 2020 •

edited

Loading