-
Notifications
You must be signed in to change notification settings - Fork 94
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use find providers async context #172
base: main
Are you sure you want to change the base?
Conversation
Codecov Report
@@ Coverage Diff @@
## main #172 +/- ##
===========================================
- Coverage 65.58% 26.01% -39.58%
===========================================
Files 207 100 -107
Lines 25540 11061 -14479
===========================================
- Hits 16750 2877 -13873
- Misses 7323 7842 +519
+ Partials 1467 342 -1125
|
Assuming this is low priority for now, if not reach out. |
e9a7c12
to
add7775
Compare
@hannahhoward I would like to merge your pull request because it fixes an issue with bitswap where opentelemetry traces between bitswap inputs are not linked with the content router however your fix bring an other issue, the providerquerymanager deduplicate FindProvidersAsync call. In other words if this sequence happen:
This is captured by the I'm not sure why bitswap needs to deduplicate content routing queries like that. I would suggest we remove all of this if possible, how likely is it you have two sessions requesting the same blocks anyway ? Is there something else I am missing, do you think of anything bad that will happen when I remove the providerquerymanager ? |
Closes: #172 See #172 (comment) for rational.
Closes: #172 See #172 (comment) for rational.
Closes: #172 See #172 (comment) too. providerQueryManager took care of: - Deduping multiple sessions doing find providers for the same CID. - limiting global find providers. None of which we care: - This is rare, if this happens it's fine to run the same query twice. If we care then we should make a deduping content router so we can inject it anywhere a content router is needed. - It's fine to allow one concurrent find peer per session. No need to limit this at 6 globally after that, it's a great way to stall nodes doing many queries.
Closes: #172 See #172 (comment) too. providerQueryManager took care of: - Deduping multiple sessions doing find providers for the same CID. - limiting global find providers. None of which we care: - This is rare, if this happens it's fine to run the same query twice. If we care then we should make a deduping content router so we can inject it anywhere a content router is needed. - It's fine to allow one concurrent find peer per session. No need to limit this at 6 globally after that, it's a great way to stall nodes doing many queries.
Closes: #172 See #172 (comment) too. providerQueryManager took care of: - Deduping multiple sessions doing find providers for the same CID. - limiting global find providers. None of which we care: - This is rare, if this happens it's fine to run the same query twice. If we care then we should make a deduping content router so we can inject it anywhere a content router is needed. - It's fine to allow one concurrent find peer per session. No need to limit this at 6 globally after that, it's a great way to stall nodes doing many queries.
Closes: #172 See #172 (comment) too. providerQueryManager took care of: - Deduping multiple sessions doing find providers for the same CID. - limiting global find providers. None of which we care: - This is rare, if this happens it's fine to run the same query twice. If we care then we should make a deduping content router so we can inject it anywhere a content router is needed. - It's fine to allow one concurrent find peer per session. No need to limit this at 6 globally after that, it's a great way to stall nodes doing many queries.
Closes: #172 See #172 (comment) too. providerQueryManager took care of: - Deduping multiple sessions doing find providers for the same CID. - limiting global find providers. None of which we care: - This is rare, if this happens it's fine to run the same query twice. If we care then we should make a deduping content router so we can inject it anywhere a content router is needed. - It's fine to allow one concurrent find peer per session. No need to limit this at 6 globally after that, it's a great way to stall nodes doing many queries.
Closes: #172 See #172 (comment) too. providerQueryManager took care of: - Deduping multiple sessions doing find providers for the same CID. - limiting global find providers. None of which we care: - This is rare, if this happens it's fine to run the same query twice. If we care then we should make a deduping content router so we can inject it anywhere a content router is needed. - It's fine to allow one concurrent find peer per session. No need to limit this at 6 globally after that, it's a great way to stall nodes doing many queries.
Closes: #172 See #172 (comment) too. providerQueryManager took care of: - Deduping multiple sessions doing find providers for the same CID. - limiting global find providers. None of which we care: - This is rare, if this happens it's fine to run the same query twice. If we care then we should make a deduping content router so we can inject it anywhere a content router is needed. - It's fine to allow one concurrent find peer per session. No need to limit this at 6 globally after that, it's a great way to stall nodes doing many queries.
Closes: #172 See #172 (comment) too. providerQueryManager took care of: - Deduping multiple sessions doing find providers for the same CID. - limiting global find providers. None of which we care: - This is rare, if this happens it's fine to run the same query twice. If we care then we should make a deduping content router so we can inject it anywhere a content router is needed. - It's fine to allow one concurrent find peer per session. No need to limit this at 6 globally after that, it's a great way to stall nodes doing many queries.
Closes: #172 See #172 (comment) too. providerQueryManager took care of: - Deduping multiple sessions doing find providers for the same CID. - limiting global find providers. None of which we care: - This is rare, if this happens it's fine to run the same query twice. If we care then we should make a deduping content router so we can inject it anywhere a content router is needed. - It's fine to allow one concurrent find peer per session. No need to limit this at 6 globally after that, it's a great way to stall nodes doing many queries.
Closes: #172 See #172 (comment) too. providerQueryManager took care of: - Deduping multiple sessions doing find providers for the same CID. - limiting global find providers. None of which we care: - This is rare, if this happens it's fine to run the same query twice. If we care then we should make a deduping content router so we can inject it anywhere a content router is needed. - It's fine to allow one concurrent find peer per session. No need to limit this at 6 globally after that, it's a great way to stall nodes doing many queries.
Goals
In Bitswap, when I call FindProvidersAsync on a call to the ProviderQueryManager, the ultimate call to DHT should retain the context I used with the ProviderQueryManager.
Background
Since I imagine the review here may or may not have ever even worked with the ProviderQueryManager in Bitswap, here's some background on it.
Essentially I wrote this piece of code almost three years ago, which has two purposes:
It is a lot of code to manage all that, and looking at it now, I'm like, "wow new Go programmer me you were really go-routine happy back then". It all works (cause you know, this runs in every bitswap) but could be simplified a lot.
But, today is not that day.
Implementation
I was suprised to find the context I passed to initialize a bitswap session in Lassie was not being retained here. Instead, I was getting global context. Digging into this, I found what I can only assume is a bug in the code I wrote. While the session context is passed across the go-routine barrier, it isn't even used. Instead the global context is. Perhaps because of the wonderfully similar named variables
npqm
andpqm
to represent the structs that have the local and global context respectively.I am pretty sure this is a bug, cause otherwise why pass the session context over the go-routine variable? So I corrected it.
For Discussion
Looking at my code... well I'm not actually sure of intent here. There are implications to using the session context -- if two queries are called for the same CID in two different sessions, the first sessions context is used, so if that session gets cancelled, then the others query is also cancelled. Maybe that's why I used the global context? But then why pass the context across the go-routine at all (alternate solution: don't do so for clarity).
The use case we're dealing with is trying to extract value's from the context. Maybe that's too much of an edge case to optimize for.
But this PR is aimed to identify the bug (there's a context passed over a go routine but not used) and let you all decide the right outcome.