Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Replace default asyncio by uvloop when available #821

Draft
wants to merge 2 commits into
base: branch-0.24
Choose a base branch
from

Conversation

pentschev
Copy link
Member

Replace default asyncio by uvloop when available. Provides up to 2x speedup in preliminary benchmarks.

asyncio
$ python benchmarks/send-recv.py --reuse-alloc -o numpy --n-iter 10 -n 100kiB
Server Running at 10.33.225.163:37981
Client connecting to server at 10.33.225.163:37981
Roundtrip benchmark
--------------------------
n_iter          | 10
n_bytes         | 100.00 kiB
object          | numpy
reuse alloc     | True
transfer API    | TAG
UCX_TLS         | all
UCX_NET_DEVICES | all
==========================
Device(s)       | CPU-only
Server CPU      | affinity not set
Client CPU      | affinity not set
Average         | 602.88 MiB/s
Median          | 761.13 MiB/s
--------------------------
Iterations
--------------------------
000         |231.92 MiB/s
001         |558.95 MiB/s
002         |791.95 MiB/s
003         |897.06 MiB/s
004         |908.38 MiB/s
005         |611.67 MiB/s
006         |619.92 MiB/s
007         |732.61 MiB/s
008         |845.71 MiB/s
009         |824.81 MiB/s
uvloop
$ python benchmarks/send-recv.py --reuse-alloc -o numpy --n-iter 10 -n 100kiB
Server Running at 10.33.225.163:51962
Client connecting to server at 10.33.225.163:51962
Roundtrip benchmark
--------------------------
n_iter          | 10
n_bytes         | 100.00 kiB
object          | numpy
reuse alloc     | True
transfer API    | TAG
UCX_TLS         | all
UCX_NET_DEVICES | all
==========================
Device(s)       | CPU-only
Server CPU      | affinity not set
Client CPU      | affinity not set
Average         | 1.00 GiB/s
Median          | 1.21 GiB/s
--------------------------
Iterations
--------------------------
000         |409.89 MiB/s
001         | 1.23 GiB/s
002         | 1.14 GiB/s
003         | 1.31 GiB/s
004         | 1.01 GiB/s
005         | 1.43 GiB/s
006         | 1.43 GiB/s
007         | 1.44 GiB/s
008         | 0.93 GiB/s
009         | 1.19 GiB/s

@pentschev pentschev changed the title Uvloop Replace default asyncio by uvloop when available Dec 1, 2021
@jakirkham
Copy link
Member

Very cool! Thanks for sharing this Peter 😄

cc @jcrist (who may find this of interest)

@madsbk
Copy link
Member

madsbk commented Dec 2, 2021

@pentschev Awesome!

@jakirkham
Copy link
Member

Part of the reason this didn't have to touch much code is we are using event loops throughout and calling methods on them, which should just work with uvloop.

There was one other line that we might want to update somehow. Filed issue ( #822 ) on it in case we want to handle that separately (though it hopefully is a small change once we decide what we want to do there)

@jcrist
Copy link

jcrist commented Dec 2, 2021

IMO a library like ucx-py shouldn't configure an eventloop at all, that's something for an end-user application to do (similar to configuring logging handlers, libraries should only produce logs, applications should configure handlers). So something like dask (an application) can configure a specific event loop to use (currently configurable with distributed.admin.event-loop).

UVLoop isn't always faster, and there's sometimes good reasons for specifying which event loop to use. If y'all merge this, we'll have to ensure that ucx-py is imported before configuring the event loop in distributed, otherwise this will trample over distributed's configs.

Comment on lines +44 to +49
try:
import uvloop

uvloop.install()
except ImportError:
pass
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Per Jim's comment, perhaps this should be moved to the benchmarking script instead?

@pentschev
Copy link
Member Author

IMO a library like ucx-py shouldn't configure an eventloop at all, that's something for an end-user application to do (similar to configuring logging handlers, libraries should only produce logs, applications should configure handlers). So something like dask (an application) can configure a specific event loop to use (currently configurable with distributed.admin.event-loop).
...
If y'all merge this, we'll have to ensure that ucx-py is imported before configuring the event loop in distributed, otherwise this will trample over distributed's configs.

I agree, this is something I was exploring only but I feared exactly that would be problematic.

UVLoop isn't always faster, and there's sometimes good reasons for specifying which event loop to use.

Could you point to some examples of uvloop being slower (or at least not faster) and reasons why someone would not want to use them?

@jcrist
Copy link

jcrist commented Dec 2, 2021

Could you point to some examples of uvloop being slower (or at least not faster) and reasons why someone would not want to use them

Sure. One example is tornado, where uvloop is ~ the same performance as asyncio for the backing loop (and sometimes worse). The main reason is transparency though. I don't like it when libraries implicitly configure something as fundamental as the event loop implementation. Bokeh does this already to work around a no-longer-valid issue with tornado and it took me a couple hours to track down where the implicit change was coming from.

@pentschev
Copy link
Member Author

Sure. One example is tornado, where uvloop is ~ the same performance as asyncio for the backing loop (and sometimes worse).

Interesting. Do you have reproducers or results for that somewhere? I would be interested in taking a closer look at that if that's available.

The main reason is transparency though. I don't like it when libraries implicitly configure something as fundamental as the event loop implementation.

Ah yes, I misunderstood your initial statement, and completely agree implicitly configuring things like that is a terrible practice. Is it possible to globally configure what event loop to use with Dask today? Certainly some configuration like that would be a more useful approach for Dask+UCX-Py. For UCX-Py standalone usage, we could always introduce some configuration to choose between default asyncio event loop or uvloop as well, or just instruct users to use uvloop when they want.

@jcrist
Copy link

jcrist commented Dec 2, 2021

Do you have reproducers or results for that somewhere? I would be interested in taking a closer look at that if that's available.

It's definitely all application specific, but just try enabling uvloop for some tornado workflow and take a look. One example would be the benchmarks I recently did for the asyncio-comms PR in distributed: dask/distributed#5450 (comment). Uvloop was usually negligibly faster, and sometimes slower.

Is it possible to globally configure what event loop to use with Dask today?

Yes, set distributed.admin.event-loop to uvloop. If we find that for distributed's workflow uvloop is generally faster we might change the default to something like "auto", signalling to try uvloop and fallback to asyncio if not installed. That way the user can still explicitly configure things without being trampled.

or just instruct users to use uvloop when they want.

That'd be my recommendation.

@pentschev
Copy link
Member Author

Thanks for the details @jcrist , really appreciate it! Will do some more testing with uvloop and try to come up with a nice, non-intrusive solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: No status
Development

Successfully merging this pull request may close these issues.

4 participants