-
Notifications
You must be signed in to change notification settings - Fork 97
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(bitswap/client/msgq): prevent duplicate requests #691
base: main
Are you sure you want to change the base?
Conversation
if mq.bcstWants.sent.Len() == 0 && mq.peerWants.sent.Len() == 0 { | ||
return false | ||
mq.rebroadcastIntervalLk.RLock() | ||
rebroadcastInterval := mq.rebroadcastInterval |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively, this could be a different new parameter/constant
I tested this on a k8s cluster and with a local node connected to it. It works as expected, but I believe this would benefit a lot from a proper test. Unfortunately, I can't allocate time to writing one. It's not that straightforward. |
For context, I detect duplicates with a custom multihash that logs out when the same data is hashed again. This essentially uncovered #690, and this issue |
d193c2f
to
9020b71
Compare
Codecov ReportAll modified and coverable lines are covered by tests ✅
@@ Coverage Diff @@
## main #691 +/- ##
==========================================
+ Coverage 60.36% 60.41% +0.05%
==========================================
Files 243 243
Lines 31034 31030 -4
==========================================
+ Hits 18734 18748 +14
+ Misses 10634 10618 -16
+ Partials 1666 1664 -2
... and 16 files with indirect coverage changes 🚨 Try these New Features:
|
mq.peerWants.sent.Remove(want.Cid) | ||
toRebroadcast++ | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This loop looks like a duplicate of the above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Its peerWants vs broadcastWants tho
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I mean here is that it looks like the duplicate, but it works on different map
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What I meant is these look like they should call the same function defined for a WantList
. Like Absorb
in boxo/bitswap/client/wantlist/wantlist.go
mq.peerWants.sent.Remove(want.Cid) | ||
toRebroadcast++ | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The previous call to mq.peerWants.pending.Absorb
also did:
// Invalidate the cache up-front to avoid doing any work trying to keep it up-to-date.
w.cached = nil
Evan though all the send entries may not be added to pending in the new code, the cache still needs to be cleared. Otherwise, calling mq.peerWants.pending.Entries()
will not return the newly added entries. Alternatively, the new entries can be to added to the wantlist's cached
entries and sorted.
I suggest to keep the previous code:
mq.bcstWants.pending.Absorb(mq.bcstWants.sent)
mq.peerWants.pending.Absorb(mq.peerWants.sent)
And instead modify the Absorb
function in boxo/bitswap/client/wantlist/wantlist.go
. WDYT?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't work because pending *WantList
doesn't have access to sentAt
map.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can try to make a new method on recallWantList
struct tho
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, check the new version
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the cache invalidation note. WantList.Add
also invalidates the cache, so we don't have to it.
if mq.bcstWants.sent.Len() == 0 && mq.peerWants.sent.Len() == 0 { | ||
return false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is probably good to leave since it avoids Lock/Unlock of mq.rebroadcastIntervalLk
and time.Now()
.
if mq.bcstWants.sent.Len() == 0 && mq.peerWants.sent.Len() == 0 {
return 0
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The lock exists only for testing. The interval is never changed outside of the unit test. Thus, I don't see any contention zero length check could prevent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the comment is not about contention but about saving unnecessary lock/unlock calls, but if this only happens every 30 seconds, then it's probably not very important.
triage note: This is a good candidate for testing in rainbow staging to observe performance differences. |
9020b71
to
5dc309b
Compare
Previously, in-progress requests could be re-requested again during periodic rebroadcast. The queue requests, and while awaiting response, the rebroadcast event happens. Rebroadcast event changes previosly sent WANTs to pending and sends them again in a new message. The solution here is to ensure WANT was in sent status for long enough, before bringing it back to pending. This utilizes existing `sendAt` map which tracks when every CID was sent.
5dc309b
to
993c48c
Compare
// Absorb all the entries in other into this want list | ||
func (w *Wantlist) Absorb(other *Wantlist) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Deleted as deadcode
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The main thing to consider here is that:
- before, a "want" would be re-broadcasted at most 30 seconds after it was sent (could be 0.1s)
- after, a "want" would be re-broadcasted only after at least 30 seconds after it was sent (could be 59.9s).
In that respect the code looks good.
I am not sure how much of an improvement this is in practice (perhaps clients were lucky to hit a short rebroadcast period sometimes), but it makes clients more respectful at least and perf should not be based on "luck".
I think we can test on staging and discuss in the next triage if we accept the change.
if mq.bcstWants.sent.Len() == 0 && mq.peerWants.sent.Len() == 0 { | ||
return false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the comment is not about contention but about saving unnecessary lock/unlock calls, but if this only happens every 30 seconds, then it's probably not very important.
Need to test on staging before merge. |
Previously, in-progress requests could be re-requested again during periodic rebroadcast. The queue requests, and while awaiting a response, the rebroadcast event happens. Rebroadcast event changes previously sent WANTs to pending and sends them again in a new message, duplicating some WANT requests.
The solution here is to ensure WANT was in sent status for long enough before bringing it back to pending. This utilizes the existing
sendAt
map, which tracks when every CID was sent. Then, on every event, it compares if the message was around longer thanrebroadcastInterval