Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keep data in fails cases in sync service #2361

Open
wants to merge 25 commits into
base: master
Choose a base branch
from

Conversation

AurelienFT
Copy link
Contributor

@AurelienFT AurelienFT commented Oct 15, 2024

Linked Issues/PRs

Closes #2357

Description

This pull request introduces a caching mechanism to the sync service to avoid redundant data fetching from the network. The most important changes include adding a cache module, modifying the Import struct to include a cache, and updating related methods to utilize this cache.

Caching Mechanism:

  • crates/services/sync/src/import.rs: Added a new cache module and integrated it into the Import struct. Updated methods to use the cache for fetching and storing headers and blocks.
  • Cache mechanism allow use to retrieve a stream of batches of either cached headers, cached full blocks, or range to fetch data.

Test Updates:

  • Update the P2P port in mocks to use async to simulate more complex tests needed for this feature.

This PR contains 50% of changes in the tests and addition of tests in the cache.

Checklist

  • Breaking changes are clearly marked as such in the PR description and changelog
  • New behavior is reflected in tests
  • The specification matches the implemented behavior (link update PR if changes are needed)

Before requesting review

  • I have reviewed the code myself
  • I have created follow-up issues caused by this PR and linked them here

@AurelienFT AurelienFT marked this pull request as ready for review October 16, 2024 16:44
@AurelienFT AurelienFT requested a review from a team October 16, 2024 16:44
Copy link
Contributor

@netrome netrome left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the import task well enough to approve right now. I need clarification on the following points:

  1. How do we ensure this cache doesn't grow forever? Is the Import task short-lived? While the import task launches short-lived streams, it seems like a long-living task to me.
  2. How can we be sure we'll query exactly the same ranges as we have cached? Where is that invariant maintained.

Let me know if you want to jump on a call to chat about this, or just write if I'm missing something obvious here.

@@ -98,6 +99,26 @@ pub struct Import<P, E, C> {
executor: Arc<E>,
/// Consensus port.
consensus: Arc<C>,
/// A cache of already validated header or blocks.
cache: SharedMutex<HashMap<Range<u32>, CachedData>>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: Perhaps we should consider using a DashMap here instead like in #2352

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alternatively since we're storing ranges, a BTreeMap could be useful. I'm a bit hesitant towards using Range<u32> as key here. It's not obvious to me that we'll process exactly the same ranges if the stream is restarted. It would be more robust to instead maintain a map of the block height to the cached data at that height.

Or can we be sure that we're

  1. Not storing overlapping ranges and
  2. will query the same ranges that we have cached?

Copy link
Contributor Author

@AurelienFT AurelienFT Oct 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can use DashMap didn't know about it, seems interesting.

About storing range, I answered here: #2361 (comment) , however I'm trying to implement it now but the problem is that p2p only works with range and so it's a problem if we have a cache in the middle of the range, we would have to bisect the range in two.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah that's why I think a BTreeMap is pretty nice since they allow for querying ranges of keys directly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

YYeah but this doesn't solve the problem that we will have to split ranges asked to p2p and multiply the number of requests

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DashMap isn't an option anymore as we are using BTreeMap

header_stream
let ranges = range_chunks(range, params.header_batch_size);
futures::stream::iter(ranges)
.map({
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While the pattern was established before this PR, I think it would be nice to use then instead of map here, and skip the .awaits. We'd be able to return just a Stream<Item = SealedBlockBatch> instead of having the nested futures in the returned stream.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree and there is a lot more things to improve on this service I don't wanna make this PR even bigger and so I created an issue for that : #2370

crates/services/sync/src/import.rs Outdated Show resolved Hide resolved
@AurelienFT
Copy link
Contributor Author

AurelienFT commented Oct 16, 2024

@netrome Thanks for taking the time to review this Regarding your interrogations :
1 - Yes for me it will leave a long time but all asked data should be ok at some point and so be cleared otherwise we will only have batch_size as number of element in the cache. But I'm not very sure about this that's why I placed a comment about this in "Interrogation" in the PR. Maybe we need a pruning management.
2 - I was thinking that we re-ask all the same ranges because the batch_size doesn't change but the starting point can change to the last synced block and so ranges can change. I think you are right then the ranges can change I will ask few questions to @xgreenx

@AurelienFT AurelienFT changed the base branch from release/v0.40.0 to master October 16, 2024 21:21
Copy link
Contributor

@rafal-ch rafal-ch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So far looks good, I need to have a deeper look at the tests though.

CHANGELOG.md Outdated Show resolved Hide resolved
crates/services/sync/src/import.rs Outdated Show resolved Hide resolved
crates/services/sync/src/import/back_pressure_tests.rs Outdated Show resolved Hide resolved
@AurelienFT AurelienFT marked this pull request as draft October 17, 2024 09:17
@AurelienFT
Copy link
Contributor Author

Convert to draft because of big refacto.

@AurelienFT AurelienFT marked this pull request as ready for review October 18, 2024 10:33
@AurelienFT
Copy link
Contributor Author

Now everything is cached one by one but there is an issue that I'm having hard time to find a solution. When we successfully had the header but never had the transactions, we need the peer_id to ask the transactions again. However if I cache the peer_id that we used to get the header and failed to give us transactions it will ask him again and I don't think we want to re-ask to someone that returned a fail. But I don't have any ways to find a peer that I know have the transactions.

On top of that the range that I build from cached data could have been fetched from multiple peers.
The only solution I see that simplify everything but cache less things is to cache only full blocks.
Any ideas @netrome @xgreenx @rafal-ch ?

@AurelienFT AurelienFT marked this pull request as draft October 21, 2024 11:40
@netrome
Copy link
Contributor

netrome commented Oct 21, 2024

Now everything is cached one by one but there is an issue that I'm having hard time to find a solution. When we successfully had the header but never had the transactions, we need the peer_id to ask the transactions again. However if I cache the peer_id that we used to get the header and failed to give us transactions it will ask him again and I don't think we want to re-ask to someone that returned a fail. But I don't have any ways to find a peer that I know have the transactions.

On top of that the range that I build from cached data could have been fetched from multiple peers. The only solution I see that simplify everything but cache less things is to cache only full blocks. Any ideas @netrome @xgreenx @rafal-ch ?

Had a chat about this. @xgreenx proposed we change the p2p interface to not require any peer ID when requesting transactions, but instead leave it up to the p2p implementation to decide which peer to request them from and return that peer ID in the response.

@AurelienFT AurelienFT changed the base branch from master to add_p2p_fetch_txs_no_peer_specified October 21, 2024 12:44
@AurelienFT AurelienFT marked this pull request as ready for review October 21, 2024 13:22
@AurelienFT AurelienFT self-assigned this Oct 24, 2024
AurelienFT added a commit that referenced this pull request Oct 31, 2024
## Linked Issues/PRs
This is a requirement for
#2361

## Description

This PR adds a way to fetch transactions with p2p but without giving a
specific peer and let p2p choose the one they prefer.
This will be used in #2361

## Checklist
- [x] Breaking changes are clearly marked as such in the PR description
and changelog
- [x] New behavior is reflected in tests
- [x] [The specification](https://github.com/FuelLabs/fuel-specs/)
matches the implemented behavior (link update PR if changes are needed)

### Before requesting review
- [x] I have reviewed the code myself
- [x] I have created follow-up issues caused by this PR and linked them
here

---------

Co-authored-by: Green Baneling <[email protected]>
Base automatically changed from add_p2p_fetch_txs_no_peer_specified to master October 31, 2024 08:47
@rymnc rymnc requested a review from Copilot November 21, 2024 10:18

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot reviewed 6 out of 10 changed files in this pull request and generated no suggestions.

Files not reviewed (4)
  • crates/services/sync/src/import/test_helpers/pressure_peer_to_peer.rs: Evaluated as low risk
  • crates/services/sync/src/import/tests.rs: Evaluated as low risk
  • crates/services/sync/src/ports.rs: Evaluated as low risk
  • CHANGELOG.md: Evaluated as low risk
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

fuel-core-sync should cache the result of responses instead of throwing them away
3 participants