Performance: register atomic waker lazily in AsyncRingBuf #28
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
For the Async interface, registering atomic waker seems to affect performance quite a bit.
One way to reduce this cost is to register the waker only lazily: registering waker when the future is about to return
Pending
. So in a lot of cases when the buffer is readily available no registering is needed.There is one caveat though: the caller must check again after
register_waker
, that the ring-buffer's expected state has not changed (i.e., the producer afterbuffer full and not closed
->about to return Pending
->register waker
must check again thatbuffer is still full and not closed
->Pending
, and the consumer afterbuffer empty and not closed
->about to return Pending
->register waker
must check again thatbuffer is still empty and not closed
). If the state has changed before thewaker
is registered, we might miss new notification ofclose
orwrite
/read
.As long as actual
Pending
return cases is relatively just a small portion of allpoll
ing, this might result in a better performance, since it prevents wasted synchronization of unnecessary waker registrations.I've written a simple benchmark with Tokio runtime. And here is one GitHub Action run:
master:
lazy register:
This particular task runs about 30% faster: