fix(bitswap/client/msgq): prevent duplicate requests #691

Wondertan · 2024-10-17T18:42:50Z

Previously, in-progress requests could be re-requested again during periodic rebroadcast. The queue requests, and while awaiting a response, the rebroadcast event happens. Rebroadcast event changes previously sent WANTs to pending and sends them again in a new message, duplicating some WANT requests.

The solution here is to ensure WANT was in sent status for long enough before bringing it back to pending. This utilizes the existing sendAt map, which tracks when every CID was sent. Then, on every event, it compares if the message was around longer than rebroadcastInterval

Wondertan · 2024-10-17T18:44:00Z

bitswap/client/internal/messagequeue/messagequeue.go

-	if mq.bcstWants.sent.Len() == 0 && mq.peerWants.sent.Len() == 0 {
-		return false
+	mq.rebroadcastIntervalLk.RLock()
+	rebroadcastInterval := mq.rebroadcastInterval


Alternatively, this could be a different new parameter/constant

Wondertan · 2024-10-17T18:45:45Z

I tested this on a k8s cluster and with a local node connected to it. It works as expected, but I believe this would benefit a lot from a proper test. Unfortunately, I can't allocate time to writing one. It's not that straightforward.

Wondertan · 2024-10-17T20:11:01Z

For context, I detect duplicates with a custom multihash that logs out when the same data is hashed again. This essentially uncovered #690, and this issue

codecov · 2024-10-19T23:23:33Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 60.41%. Comparing base (19bcc75) to head (a61d89f).
Report is 50 commits behind head on main.

@@            Coverage Diff             @@
##             main     #691      +/-   ##
==========================================
+ Coverage   60.36%   60.41%   +0.05%     
==========================================
  Files         243      243              
  Lines       31034    31030       -4     
==========================================
+ Hits        18734    18748      +14     
+ Misses      10634    10618      -16     
+ Partials     1666     1664       -2

Files with missing lines	Coverage Δ
...tswap/client/internal/messagequeue/messagequeue.go	`84.49% <100.00%> (-0.43%)`	⬇️
bitswap/client/wantlist/wantlist.go	`90.90% <ø> (-0.88%)`	⬇️

... and 16 files with indirect coverage changes

🚨 Try these New Features:

Flaky Tests Detection - Detect and resolve failed and flaky tests

gammazero · 2024-10-28T15:42:33Z

bitswap/client/internal/messagequeue/messagequeue.go

+			mq.peerWants.sent.Remove(want.Cid)
+			toRebroadcast++
+		}
+	}


This loop looks like a duplicate of the above.

Its peerWants vs broadcastWants tho

What I mean here is that it looks like the duplicate, but it works on different map

What I meant is these look like they should call the same function defined for a WantList. Like Absorb in boxo/bitswap/client/wantlist/wantlist.go

gammazero · 2024-10-29T02:20:45Z

bitswap/client/internal/messagequeue/messagequeue.go

+			mq.peerWants.sent.Remove(want.Cid)
+			toRebroadcast++
+		}
+	}


The previous call to mq.peerWants.pending.Absorb also did:

// Invalidate the cache up-front to avoid doing any work trying to keep it up-to-date. w.cached = nil

Evan though all the send entries may not be added to pending in the new code, the cache still needs to be cleared. Otherwise, calling mq.peerWants.pending.Entries() will not return the newly added entries. Alternatively, the new entries can be to added to the wantlist's cached entries and sorted.

I suggest to keep the previous code:

mq.bcstWants.pending.Absorb(mq.bcstWants.sent) mq.peerWants.pending.Absorb(mq.peerWants.sent)

And instead modify the Absorb function in boxo/bitswap/client/wantlist/wantlist.go. WDYT?

This doesn't work because pending *WantList doesn't have access to sentAt map.

I can try to make a new method on recallWantList struct tho

Ok, check the new version

On the cache invalidation note. WantList.Add also invalidates the cache, so we don't have to it.

gammazero · 2024-10-29T02:25:12Z

bitswap/client/internal/messagequeue/messagequeue.go

-	if mq.bcstWants.sent.Len() == 0 && mq.peerWants.sent.Len() == 0 {
-		return false


This is probably good to leave since it avoids Lock/Unlock of mq.rebroadcastIntervalLk and time.Now().

if mq.bcstWants.sent.Len() == 0 && mq.peerWants.sent.Len() == 0 { return 0 }

The lock exists only for testing. The interval is never changed outside of the unit test. Thus, I don't see any contention zero length check could prevent.

I think the comment is not about contention but about saving unnecessary lock/unlock calls, but if this only happens every 30 seconds, then it's probably not very important.

gammazero · 2024-10-29T17:38:20Z

triage note: This is a good candidate for testing in rainbow staging to observe performance differences.

Previously, in-progress requests could be re-requested again during periodic rebroadcast. The queue requests, and while awaiting response, the rebroadcast event happens. Rebroadcast event changes previosly sent WANTs to pending and sends them again in a new message. The solution here is to ensure WANT was in sent status for long enough, before bringing it back to pending. This utilizes existing `sendAt` map which tracks when every CID was sent.

Wondertan · 2024-10-29T19:32:45Z

bitswap/client/wantlist/wantlist.go

-// Absorb all the entries in other into this want list
-func (w *Wantlist) Absorb(other *Wantlist) {


Deleted as deadcode

hsanjuan

The main thing to consider here is that:

before, a "want" would be re-broadcasted at most 30 seconds after it was sent (could be 0.1s)
after, a "want" would be re-broadcasted only after at least 30 seconds after it was sent (could be 59.9s).

In that respect the code looks good.

I am not sure how much of an improvement this is in practice (perhaps clients were lucky to hit a short rebroadcast period sometimes), but it makes clients more respectful at least and perf should not be based on "luck".

I think we can test on staging and discuss in the next triage if we accept the change.

hsanjuan · 2024-11-13T16:39:38Z

bitswap/client/internal/messagequeue/messagequeue.go

-	if mq.bcstWants.sent.Len() == 0 && mq.peerWants.sent.Len() == 0 {
-		return false


I think the comment is not about contention but about saving unnecessary lock/unlock calls, but if this only happens every 30 seconds, then it's probably not very important.

gammazero · 2024-11-19T17:36:43Z

Need to test on staging before merge.

Wondertan requested a review from a team as a code owner October 17, 2024 18:42

Wondertan commented Oct 17, 2024

View reviewed changes

Wondertan force-pushed the message-queue-duplicates branch 3 times, most recently from d193c2f to 9020b71 Compare October 19, 2024 23:20

lidel added the need/triage Needs initial labeling and prioritization label Oct 22, 2024

gammazero added need/analysis Needs further analysis before proceeding need/maintainers-input Needs input from the current maintainer(s) and removed need/triage Needs initial labeling and prioritization labels Oct 22, 2024

gammazero reviewed Oct 28, 2024

View reviewed changes

gammazero reviewed Oct 29, 2024

View reviewed changes

gammazero added status/blocked Unable to be worked further until needs are met need/author-input Needs input from the original author and removed need/maintainers-input Needs input from the current maintainer(s) labels Oct 29, 2024

Wondertan force-pushed the message-queue-duplicates branch from 9020b71 to 5dc309b Compare October 29, 2024 19:17

Wondertan force-pushed the message-queue-duplicates branch from 5dc309b to 993c48c Compare October 29, 2024 19:29

delete absorb as unused

a61d89f

Wondertan commented Oct 29, 2024

View reviewed changes

lidel requested a review from hsanjuan November 12, 2024 17:35

hsanjuan approved these changes Nov 13, 2024

View reviewed changes

gammazero added need/maintainers-input Needs input from the current maintainer(s) and removed need/analysis Needs further analysis before proceeding need/author-input Needs input from the original author status/blocked Unable to be worked further until needs are met labels Nov 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(bitswap/client/msgq): prevent duplicate requests #691

fix(bitswap/client/msgq): prevent duplicate requests #691

Wondertan commented Oct 17, 2024 •

edited

Loading

Wondertan Oct 17, 2024 •

edited

Loading

Wondertan commented Oct 17, 2024

Wondertan commented Oct 17, 2024 •

edited

Loading

codecov bot commented Oct 19, 2024 •

edited

Loading

gammazero Oct 28, 2024

Wondertan Oct 28, 2024

Wondertan Oct 28, 2024

gammazero Oct 29, 2024 •

edited

Loading

gammazero Oct 29, 2024

Wondertan Oct 29, 2024

Wondertan Oct 29, 2024

Wondertan Oct 29, 2024

Wondertan Oct 29, 2024

gammazero Oct 29, 2024

Wondertan Oct 29, 2024

hsanjuan Nov 13, 2024

gammazero commented Oct 29, 2024

Wondertan Oct 29, 2024

hsanjuan left a comment

hsanjuan Nov 13, 2024

gammazero commented Nov 19, 2024

		if mq.bcstWants.sent.Len() == 0 && mq.peerWants.sent.Len() == 0 {
		return false

		// Absorb all the entries in other into this want list
		func (w Wantlist) Absorb(other Wantlist) {

fix(bitswap/client/msgq): prevent duplicate requests #691

Are you sure you want to change the base?

fix(bitswap/client/msgq): prevent duplicate requests #691

Conversation

Wondertan commented Oct 17, 2024 • edited Loading

Wondertan Oct 17, 2024 • edited Loading

Choose a reason for hiding this comment

Wondertan commented Oct 17, 2024

Wondertan commented Oct 17, 2024 • edited Loading

codecov bot commented Oct 19, 2024 • edited Loading

Codecov Report

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gammazero Oct 29, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gammazero commented Oct 29, 2024

Choose a reason for hiding this comment

hsanjuan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

gammazero commented Nov 19, 2024

Wondertan commented Oct 17, 2024 •

edited

Loading

Wondertan Oct 17, 2024 •

edited

Loading

Wondertan commented Oct 17, 2024 •

edited

Loading

codecov bot commented Oct 19, 2024 •

edited

Loading

gammazero Oct 29, 2024 •

edited

Loading