-
Notifications
You must be signed in to change notification settings - Fork 2.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Keep data in fails cases in sync service #2361
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand the import task well enough to approve right now. I need clarification on the following points:
- How do we ensure this cache doesn't grow forever? Is the
Import
task short-lived? While the import task launches short-lived streams, it seems like a long-living task to me. - How can we be sure we'll query exactly the same ranges as we have cached? Where is that invariant maintained.
Let me know if you want to jump on a call to chat about this, or just write if I'm missing something obvious here.
crates/services/sync/src/import.rs
Outdated
@@ -98,6 +99,26 @@ pub struct Import<P, E, C> { | |||
executor: Arc<E>, | |||
/// Consensus port. | |||
consensus: Arc<C>, | |||
/// A cache of already validated header or blocks. | |||
cache: SharedMutex<HashMap<Range<u32>, CachedData>>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Alternatively since we're storing ranges, a BTreeMap
could be useful. I'm a bit hesitant towards using Range<u32>
as key here. It's not obvious to me that we'll process exactly the same ranges if the stream is restarted. It would be more robust to instead maintain a map of the block height to the cached data at that height.
Or can we be sure that we're
- Not storing overlapping ranges and
- will query the same ranges that we have cached?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can use DashMap
didn't know about it, seems interesting.
About storing range, I answered here: #2361 (comment) , however I'm trying to implement it now but the problem is that p2p only works with range and so it's a problem if we have a cache in the middle of the range, we would have to bisect the range in two.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that's why I think a BTreeMap
is pretty nice since they allow for querying ranges of keys directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
YYeah but this doesn't solve the problem that we will have to split ranges asked to p2p and multiply the number of requests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DashMap isn't an option anymore as we are using BTreeMap
header_stream | ||
let ranges = range_chunks(range, params.header_batch_size); | ||
futures::stream::iter(ranges) | ||
.map({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While the pattern was established before this PR, I think it would be nice to use then
instead of map
here, and skip the .awaits
. We'd be able to return just a Stream<Item = SealedBlockBatch>
instead of having the nested futures in the returned stream.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree and there is a lot more things to improve on this service I don't wanna make this PR even bigger and so I created an issue for that : #2370
@netrome Thanks for taking the time to review this Regarding your interrogations : |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So far looks good, I need to have a deeper look at the tests though.
Convert to draft because of big refacto. |
…rs and blocks mixed)
…r and added a bunch of tests
Co-authored-by: Rafał Chabowski <[email protected]>
Now everything is cached one by one but there is an issue that I'm having hard time to find a solution. When we successfully had the header but never had the transactions, we need the peer_id to ask the transactions again. However if I cache the peer_id that we used to get the header and failed to give us transactions it will ask him again and I don't think we want to re-ask to someone that returned a fail. But I don't have any ways to find a peer that I know have the transactions. On top of that the range that I build from cached data could have been fetched from multiple peers. |
Had a chat about this. @xgreenx proposed we change the p2p interface to not require any peer ID when requesting transactions, but instead leave it up to the p2p implementation to decide which peer to request them from and return that peer ID in the response. |
## Linked Issues/PRs This is a requirement for #2361 ## Description This PR adds a way to fetch transactions with p2p but without giving a specific peer and let p2p choose the one they prefer. This will be used in #2361 ## Checklist - [x] Breaking changes are clearly marked as such in the PR description and changelog - [x] New behavior is reflected in tests - [x] [The specification](https://github.com/FuelLabs/fuel-specs/) matches the implemented behavior (link update PR if changes are needed) ### Before requesting review - [x] I have reviewed the code myself - [x] I have created follow-up issues caused by this PR and linked them here --------- Co-authored-by: Green Baneling <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot reviewed 6 out of 10 changed files in this pull request and generated no suggestions.
Files not reviewed (4)
- crates/services/sync/src/import/test_helpers/pressure_peer_to_peer.rs: Evaluated as low risk
- crates/services/sync/src/import/tests.rs: Evaluated as low risk
- crates/services/sync/src/ports.rs: Evaluated as low risk
- CHANGELOG.md: Evaluated as low risk
Linked Issues/PRs
Closes #2357
Description
This pull request introduces a caching mechanism to the sync service to avoid redundant data fetching from the network. The most important changes include adding a cache module, modifying the
Import
struct to include a cache, and updating related methods to utilize this cache.Caching Mechanism:
crates/services/sync/src/import.rs
: Added a newcache
module and integrated it into theImport
struct. Updated methods to use the cache for fetching and storing headers and blocks.Test Updates:
This PR contains 50% of changes in the tests and addition of tests in the cache.
Checklist
Before requesting review