Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(resharding) - congestion info computation #12581

Merged
merged 28 commits into from
Jan 7, 2025
Merged

Conversation

wacban
Copy link
Contributor

@wacban wacban commented Dec 9, 2024

Implementing the congestion info computation based on the parent congestion info and the receipt groups info from the parent trie. The seed used for the allowed shard is a bit hacky - please let me know if this makes sense.

I added assertion checking that the buffered gas in congestion info is zero iff the buffers are empty. With this in place the test_resharding_v3_buffered_receipts_towards_splitted_shard tests fail without the updated congestion info and pass with it in place.

@wacban wacban requested a review from a team as a code owner December 9, 2024 16:45
@@ -213,11 +222,24 @@ impl ReshardingManager {
};
let mem_changes = trie_changes.mem_trie_changes.as_ref().unwrap();
let new_state_root = mem_tries.apply_memtrie_changes(block_height, mem_changes);
drop(mem_tries);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is annoying but needed to prevent a deadlock.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds a bit scary 👀, but I assume it has to be this way

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a bit ugly but I don't think it's scary. Rust is smart enough to know not to reuse this variable anymore. Unless you had some other risk in mind?

let epoch_id = block.header().epoch_id();
let protocol_version = self.epoch_manager.get_epoch_protocol_version(epoch_id)?;

let trie = tries.get_trie_for_shard(new_shard_uid, new_state_root);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bootstrap_congestion_info requires TrieAccess - can I get it somehow from the memtries above directly?

// TODO(resharding): set all fields of `ChunkExtra`. Consider stronger
// typing. Clarify where it should happen when `State` and
// `FlatState` update is implemented.
let mut child_chunk_extra = ChunkExtra::clone(&chunk_extra);
*child_chunk_extra.state_root_mut() = new_state_root;
// TODO(resharding) - Implement proper congestion info for
// resharding. The current implementation is very expensive.
Copy link
Contributor

@jancionear jancionear Dec 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While we're waiting for the bandwidth scheduler to give us an efficient way to do the same

OutgoingMetadatas are already fully functional and merged into master, I didn't plan to add anything else. What sort of functionality are you looking for?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah nice! I think all I need is the total size of the buffered receipts. I'll check and come back.

@wacban wacban marked this pull request as draft December 10, 2024 09:30
Copy link
Contributor

@marcelo-gonzalez marcelo-gonzalez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks pretty sane to me but I am not super familiar w congestion control, so ill let others approve

@wacban
Copy link
Contributor Author

wacban commented Dec 11, 2024

I will do the proper version anyway so don't mind it for now please.

@wacban wacban force-pushed the waclaw-resharding branch 2 times, most recently from 9529c2f to e55dae2 Compare December 12, 2024 11:52
@wacban
Copy link
Contributor Author

wacban commented Dec 13, 2024

nayduck with extra assertions:
https://nayduck.nearone.org/#/run/917

Copy link

codecov bot commented Dec 18, 2024

Codecov Report

Attention: Patch coverage is 88.88889% with 22 lines in your changes missing coverage. Please review.

Project coverage is 70.59%. Comparing base (21b5109) to head (be19d21).
Report is 13 commits behind head on master.

Files with missing lines Patch % Lines
chain/chain/src/resharding/manager.rs 92.45% 0 Missing and 8 partials ⚠️
...chain/src/stateless_validation/chunk_validation.rs 72.00% 0 Missing and 7 partials ⚠️
chain/client/src/client.rs 33.33% 0 Missing and 2 partials ⚠️
core/primitives/src/types.rs 71.42% 2 Missing ⚠️
chain/chain/src/runtime/mod.rs 50.00% 0 Missing and 1 partial ⚠️
core/primitives/src/congestion_info.rs 83.33% 0 Missing and 1 partial ⚠️
runtime/runtime/src/congestion_control.rs 85.71% 0 Missing and 1 partial ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master   #12581      +/-   ##
==========================================
+ Coverage   70.56%   70.59%   +0.02%     
==========================================
  Files         847      847              
  Lines      172735   173367     +632     
  Branches   172735   173367     +632     
==========================================
+ Hits       121897   122380     +483     
- Misses      45737    45861     +124     
- Partials     5101     5126      +25     
Flag Coverage Δ
backward-compatibility 0.16% <0.00%> (-0.01%) ⬇️
db-migration 0.16% <0.00%> (?)
genesis-check 1.36% <0.60%> (-0.01%) ⬇️
linux 69.18% <31.31%> (-0.08%) ⬇️
linux-nightly 70.19% <88.88%> (+0.02%) ⬆️
pytests 1.66% <0.59%> (-0.01%) ⬇️
sanity-checks 1.47% <0.59%> (-0.01%) ⬇️
unittests 70.42% <88.88%> (+0.02%) ⬆️
upgradability 0.20% <0.00%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@marcelo-gonzalez marcelo-gonzalez left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it looks okay to me, but I will let @jancionear approve, since I'm not as familiar with the bandwidth scheduler/congestion code. But ping me if you need me to stamp to unblock it

let gas = receipt_groups.total_gas();
let gas = gas.try_into().expect("ReceiptGroup gas must fit in u64");

congestion_info
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we not have to set total_receipts_num as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good sanity check! The total_receipts_num field is part of ReceiptGroupsQueueData which is resharded by copying to left child only. It is handled by the memtrie and flat storage split operations. MemTrie is already implemented here and this PR adds the flat storage equivalent.

Copy link
Contributor

@jancionear jancionear left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, apart from one potential overflow

I'm not very familiar with the resharding logic, but the congestion control changes make sense 👍


let bytes = receipt_groups.total_size();
let gas = receipt_groups.total_gas();
let gas = gas.try_into().expect("ReceiptGroup gas must fit in u64");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no guarantee that total_gas will fit in u64.
70_000 receipts with 300TGas each would exceed the capacity of u64. It's unlikely that we'll ever have that many receipts, but in theory could happen.
IMO it would be better to substract the u128.

@@ -213,11 +222,24 @@ impl ReshardingManager {
};
let mem_changes = trie_changes.mem_trie_changes.as_ref().unwrap();
let new_state_root = mem_tries.apply_memtrie_changes(block_height, mem_changes);
drop(mem_tries);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds a bit scary 👀, but I assume it has to be this way

.get_shard_index(own_shard)?
.try_into()
.expect("ShardIndex must fit in u64");
let congestion_seed = own_shard_index;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be nice to use the same seed as usual (derived from block height), but I guess it doesn't matter, as long as it's deterministic and only on resharding boundaries.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah the issue was that I couldn't get hold of the block height easily on the stateless validation path. Let me add a TODO but for now as you said, as long as it's deterministic and one off it's fine.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick question, doesn't this mess up the coordination of allowed shard with respect to other shards?
Example, suppose we had 5 shards and a round robin method to assign allowed_shard. Say right before resharding we had the mapping
Shard 0 -> Shard 3
Shard 1 -> Shard 4
Shard 2 -> Shard 0
Shard 3 -> Shard 1
Shard 4 -> Shard 2

After resharding we split Shard 4 to Shard 5 and Shard 6 (index 4 and 5). If we break the coordination and just assign allowed shard based on congestion_seed = own_shard_index, wouldn't that potentially lead to two shards having the same mapped allowed_shard?

Or is this issue just for the one block after resharding and life returns to normal after this block?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There shouldn't be any duplicated allowed shards but please don't ask about missing chunks.

If I'm reading the code right, using just the shard index means that each shard will have itself as the allowed shard.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And yeah it's only for one block during resharding.

@@ -189,7 +189,7 @@ impl TestReshardingParametersBuilder {
limit_outgoing_gas: self.limit_outgoing_gas.unwrap_or(false),
delay_flat_state_resharding: self.delay_flat_state_resharding.unwrap_or(0),
short_yield_timeout: self.short_yield_timeout.unwrap_or(false),
allow_negative_refcount: self.allow_negative_refcount.unwrap_or(false),
allow_negative_refcount: self.allow_negative_refcount.unwrap_or(true),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it ok to allow negative refcount in resharding tests by default? This looks like something that could be important to catch somewhere 👀

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, it's annoying, please see this comment with the latest thoughts on this:
#12581 (comment)

// the parent's buffered receipts from the parent's congestion info.
let mut congestion_info = parent_congestion_info;
for shard_id in parent_shard_layout.shard_ids() {
let receipt_groups = ReceiptGroupsQueue::load(parent_trie, shard_id)?;
Copy link
Contributor

@jancionear jancionear Jan 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wonder if there's some way to protect against a scenario where the ReceiptGroupsQueue isn't fully initialized. The old receipts will most likely be forwarded within one epoch, but in theory there could be still be some receipts not included in the metadata when the resharding happens.
Could we detect that and postpone the protocol upgrade or something? Idk, sounds complicated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's borderline impossible that we wouldn't go through the queue for a full epoch. We do have some assertions in place to check that congestion info is zeroed in the right places iff the queue is empty so at least we'll detect the issue some time later.

I don't think delaying is feasible as resharding is scheduled an epoch in advance and we only know about this condition at the last moment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe it'd be possible to bootstrap congestion info? Idk, not ideal either

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, cool idea, it could work but if we're swamped with receipts that could blow up the state witness. I still don't think it's necessary to worry about this case.

@wacban
Copy link
Contributor Author

wacban commented Jan 2, 2025

JFYI I was looking at why the tests need to allow negative refcount -> it's the BANDWIDTH_SCHEDULER_STATE that is double deleted now. I think the issue surfaced only now because I moved the bandwidth scheduler protocol feature before resharding. I'm still not too happy about disabling this check by default but at least it makes sense. The logic is the same as for delayed receipts -> both are copied to both children and so double deleted.

cc @staffik who's looking into a proper solution for getting rid of the allow negative refcount

@wacban wacban enabled auto-merge January 3, 2025 12:25
@shreyan-gupta shreyan-gupta self-requested a review January 6, 2025 18:41
Copy link
Contributor

@shreyan-gupta shreyan-gupta left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Thanks for handling this! :)

@@ -300,11 +300,11 @@ impl CongestionInfo {
Ok(())
}

pub fn remove_buffered_receipt_gas(&mut self, gas: Gas) -> Result<(), RuntimeError> {
pub fn remove_buffered_receipt_gas(&mut self, gas: u128) -> Result<(), RuntimeError> {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a specific reason why we had to change this function signature to u128?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's possible to have the sum of gas from may receipts to exceed u64 - or so @jancionear claimed :)

}

// Assert that empty buffers match zero buffered gas.
assert_eq!(all_buffers_empty, self.own_congestion_info.buffered_receipts_gas() == 0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice assert to have!

// Update the congestion info based on the parent shard. It's
// important to do this step before the `retain_split_shard`
// because only the parent has the needed information.
if let Some(congestion_info) = chunk_extra.congestion_info_mut() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sanity check: with resharding v3 feature enabled after congestion control, shouldn't we always have congestion_info in the chunk header (and chunk extra)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, if you like I can replace the if let with an expect

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, thanks! Just wanted to make sure

.get_shard_index(own_shard)?
.try_into()
.expect("ShardIndex must fit in u64");
let congestion_seed = own_shard_index;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quick question, doesn't this mess up the coordination of allowed shard with respect to other shards?
Example, suppose we had 5 shards and a round robin method to assign allowed_shard. Say right before resharding we had the mapping
Shard 0 -> Shard 3
Shard 1 -> Shard 4
Shard 2 -> Shard 0
Shard 3 -> Shard 1
Shard 4 -> Shard 2

After resharding we split Shard 4 to Shard 5 and Shard 6 (index 4 and 5). If we break the coordination and just assign allowed shard based on congestion_seed = own_shard_index, wouldn't that potentially lead to two shards having the same mapped allowed_shard?

Or is this issue just for the one block after resharding and life returns to normal after this block?

// Set the allowed shard based on the child shard.
let next_epoch_id = epoch_manager.get_next_epoch_id(&block_hash)?;
let next_shard_layout = epoch_manager.get_shard_layout(&next_epoch_id)?;
ReshardingManager::finalize_allowed_shard(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This whole bit from ReshardingManager::get_child_congestion_info_not_finalized to ReshardingManager::finalize_allowed_shard is duplicated across here and resharding manager. I think parent_trie.retain_split_shard may be included in the mix. Sounds like a helper function to me!

@wacban wacban added this pull request to the merge queue Jan 7, 2025
@wacban wacban removed this pull request from the merge queue due to a manual request Jan 7, 2025
@wacban wacban enabled auto-merge January 7, 2025 13:33
@wacban wacban added this pull request to the merge queue Jan 7, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jan 7, 2025
@wacban wacban added this pull request to the merge queue Jan 7, 2025
@github-merge-queue github-merge-queue bot removed this pull request from the merge queue due to failed status checks Jan 7, 2025
@wacban wacban added this pull request to the merge queue Jan 7, 2025
Merged via the queue into master with commit 19601cb Jan 7, 2025
28 checks passed
@wacban wacban deleted the waclaw-resharding branch January 7, 2025 15:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants