-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Dynamically adjust proposal block ranges #1139
base: master
Are you sure you want to change the base?
Conversation
This commit introduces dynamic adjustment of the block ranges that the proposer will propose. The intention of this change is to make the proposer more robust to incomplete or inconsistent data being supplied by one or more RPC providers. Dynamic adjustment is enacted in the event that any of the following conditions are detected: - An invalid fill traces to a missing deposit (no deposit found for the specified depositId). - A deposit is known to be filled, without any associated FilledRelay event. In the first case, this results in narrowing of the block range on both the origin and destination chains. For the latter, only the destination chain is narrowed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just adding some context.
Update - resolved here: ccb5df0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nb. In general, this new feature probably warrants some additional test cases to stress out the implementation. This is especially important as it's not possible to backtest the change against previous bundles.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some comments on first read through
// - Narrow the destination block range to exclude the invalid fill. | ||
allInvalidFills | ||
.filter(({ code }) => code === InvalidFill.DepositIdNotFound) | ||
.forEach(({ fill: { depositId, originChainId, destinationChainId, blockNumber } }) => { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we sort these by block number by chain to allow us to exit the following map earlier?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not totally confident on this, but I'm not sure that we can actually simply sort and exit earlier. This is because we need to avoid making assumptions about the ordering of fills vs. deposits.
For example, the first invalid fill on a destination chain might correspond to the last missing deposit on the origin chain. Then, some later invalid fill on the destination chain might actually fill an earlier missing deposit on the origin chain. This also seems more likely to occur in the case that the RPCs are serving inconsistent data.
If we group/sort by destinationChainId and destination block number, we might not narrow the originChainId block range correctly. If we group/sort originChainId and origin block number, we might not narrow the destinationChainid block range correctly.
In general, even when it's really bad, the number of missing deposits we typically see is about 20 - 30. In order to identify both the earliest block for both origin and destination chains we might end up looping multiple times over all of the invalid fills. In the case where we only have tens missing events, it seems like it might be cheaper and simpler to just process them one by one.
Full disclosure: I might have overlooked something really obvious here. wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is a good point, that the latest deposit might match with the earliest fill and vice versra
const previousDeposit = originSpokePoolClient | ||
.getDepositsForDestinationChain(destinationChainId) | ||
.filter((deposit: DepositWithBlock) => deposit.blockNumber < blockNumber) | ||
.at(-1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What if the deposit is from a gap between spoke pool deployments or something? So there actually is no deposit for this id? Will this always return undefined in that case, which would mean no change?
If there are edge cases like that that are handled in a subtle way here, we may want to add a brief comment explaining them.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The SpokePoolClient only has the ability to handle a single SpokePool deployment, so in the case that we don't find any preceding deposit then previousDeposit
resolves to undefined
. In this case, the updateEndBlock()
helper detects that the updated endBlock
is undefined and implements a soft-pause of the chain by proposing over the previous endBlock
(as we do for Boba).
I'll update the comment on line 509 to clarify this.
There's arguably a scenario where the first deposit on a new chain goes missing. This would implicitly result in proposing over the range [0,0] for that chain because the HubPoolClient supplies previousEndBlock 0 in that scenario. I think this is an extremely remote probability, and activating a new chain requires multiple responsible adults to test the deployment and monitor the proposal, so I'm probably not inclined to handle it explicitly in the code. wdyt?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what would a bundle block range look like for a new spoke pool address? whether it be a new chain or an updated spoke pool address for an existing chain?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The bundlEndBlock is sourced from HubPoolClient.getLatestBundleEndBlockForChain(), so we inherit its behaviour. For a new deployment on an existing chain, we're bound to continue from the previous bundleEndBlock. This should work as expected.
For a new chain, the chainId index isn't found in the previous bundle, so we default to proposing from 0. This is also as was the case for the activation of zkSync and Base - so we inherit the existing behaviour.
The only change that I can foresee here is that if that initial proposal contains invalid fills due to missing deposits on the new chain, or fills for missing deposits on another chain, then we'd revert to proposing over [0,0]. I'm not sure whether that's a problem in itself, but it's a pretty extreme edge case because the number of deposits and fills for the new chain in the initial proposal are likely to be very low. and we'd detect it immediately because activating a new chain implies manual review of the proposal block ranges.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the 0,0 case is remote and doesn't cause an obvious problem, so I wouldn't worry about handling it explicitly.
This all makes sense. I think a comment that says something like:
This can return undefined if there is no known preceding deposit in range.
That will result in the chain being soft-paused by setting the endBlock equal to the previous endBlock, making this bundle cover no blocks on that chain.
Would be really helpful to the reader.
Suggested by Matt. Co-authored-by: Matt Rice <[email protected]>
src/dataworker/Dataworker.ts
Outdated
if (blockNumber > updatedBlockRanges[originChainId][1]) { | ||
return false; // Fill event is already out of scope due to previous narrowing. | ||
} | ||
return allValidFills.find( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we use a spoke pool function or map lookup to get the fill for a deposit rather than looping through all fills on each deposit?
We store by a special hash in the client to avoid this n^2 loop (which has caused speed issues in the past).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have backed that entire section out (for now) per this thread with Nick: #1139 (comment).
Here's the change: 093d80b
I did consider accessing these via a map, but tentatively decided against it because I don't think this will actually use much CPU time in practice. Arriving here is conditional on at least one deposit is detected as filled, despite no known FilledRelay event. Then, we iterate over the àllValidFills` array once per instance of "filled-but-missing" event.
So in the normal case it costs nothing, but it does cost CPU in the bad case where a chain/RPC is misbehaving. However, because we've already filtered on the missing deposits and have narrowed the block range accordingly, we should skip over most of those missing fill events anyway. So in practice I think the impact of this would be fairly limited, even though it is technically inefficient to search the allValidFills
array in this way. Does that make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does, it sounds like you're saying this loop will be run rarely so optimization isn't a concern. I wasn't aware of that when I initially read the code, but will take another pass.
const [originalStartBlock] = blockRanges[idx]; | ||
return ( | ||
(endBlock > startBlock && startBlock === originalStartBlock) || | ||
(startBlock === endBlock && startBlock === originalStartBlock - 1) // soft-pause |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why would the new start block be one behind the original start block in this case? And is it okay to leave it that way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
originalStartBlock
is normally previousEndBlock + 1
, so if startBlock === endBlock
is true then we're implementing a soft-pause on the chain, and will propose over the range of [previousEndBlock, previousEndBlock]
. This is consistent with the way that chains are "hard-paused" (i.e. disabled via the ConfigStore), so this check is simply verifying the invariant.
To be more correct, rather than assuming that previousEndBlock === originalStartBlock - 1
, this check should resolve the previous ending block via HubPoolClient.getLatestBundleEndBlockForChain()
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Understood. That makes sense.
Proceeding with the initial "invalid fills" change alone. The subsequent change can be re-introduced later.
const destSpokePoolClient = spokePoolClients[destinationChainId]; | ||
[startBlock, endBlock] = updatedBlockRanges[destinationChainId]; | ||
|
||
if (blockNumber <= endBlock) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: wouldn't blockNumber
s for invalid fills always be within the bundle block range by definition when calling loadData
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not necessarily, because we source endBlock
from updatedBlockRanges
, so it is subject to being iteratively updated within this loop. So if we are examining a fill where the blockNumber
has already been excluded by narrowing then we should just skip over that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Clarified here: 7559948
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the new implementation is a lot more clear and therefore I left more targeted questions
Proposed by Nick. Co-authored-by: nicholaspai <[email protected]>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm on board with merging this, closely monitoring the data worker proposals for a few days
OR
adding some more unit tests before we activate this.
We are probably safe to launch this and in the worst case, we'll self-dispute since the disputer does not have this narrowBlockRanges
logic. And AFAICT the disputer's logic to construct the block ranges from getWidestPossibleBlockRanges
hasn't changed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few more comments! Sorry, this logic is pretty nuanced
src/clients/BundleDataClient.ts
Outdated
@@ -289,7 +295,8 @@ export class BundleDataClient { | |||
if (historicalDeposit.found) { | |||
addRefundForValidFill(fill, historicalDeposit.deposit, blockRangeForChain); | |||
} else { | |||
allInvalidFills.push(fill); | |||
assert(historicalDeposit.found === false); // Help tsc to narrow the discriminated union type. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is annoying -- ts really should not require runtime code to force it to get that it must be this type. It won't compile without this line?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OOC @pxrl what is the error that is thrown?
|
||
// Find the previous known deposit. This may resolve a deposit before the immediately preceding depositId. | ||
const previousDeposit = originSpokePoolClient | ||
.getDepositsForDestinationChain(destinationChainId) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think these are guaranteed to be sorted, which might mean this deposit would be too early, right?
const previousDeposit = originSpokePoolClient | ||
.getDepositsForDestinationChain(destinationChainId) | ||
.filter((deposit: DepositWithBlock) => deposit.blockNumber < blockNumber) | ||
.at(-1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the 0,0 case is remote and doesn't cause an obvious problem, so I wouldn't worry about handling it explicitly.
This all makes sense. I think a comment that says something like:
This can return undefined if there is no known preceding deposit in range.
That will result in the chain being soft-paused by setting the endBlock equal to the previous endBlock, making this bundle cover no blocks on that chain.
Would be really helpful to the reader.
assert( | ||
endBlock < currentEndBlock, | ||
`Invalid block range update for chain ${chainId}: block ${endBlock} >= ${currentEndBlock}` | ||
); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since these updates can come in out of order, I would expect this case to happen, no?
I find a missing deposit at block 7, then later I see a missing deposit at block 12.
Maybe I'm missing the logic that prevents this.
src/dataworker/Dataworker.ts
Outdated
@@ -263,10 +266,10 @@ export class Dataworker { | |||
* of log level | |||
* @returns Array of blocks ranges to propose for next bundle. | |||
*/ | |||
_getNextProposalBlockRanges( | |||
async _getNextProposalBlockRanges( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like this function only has one await and it's at the very last line. Thoughts on removing this async
and returning the promise generated by the call to this.narrowProposalBlockRanges(blockRangesForProposal, spokePoolClients)
src/dataworker/Dataworker.ts
Outdated
blockRanges: number[][], | ||
spokePoolClients: SpokePoolClientsByChain, | ||
logData = true, | ||
isUBA = false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will this need to be changed for the UBA removal changes?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few comments
This commit introduces dynamic adjustment of the block ranges that the proposer will propose. The intention of this change is to make the proposer more robust to incomplete or inconsistent data being supplied by one or more RPC providers.
Dynamic adjustment is enacted in the event that any of the following conditions are detected:
In the first case, this results in narrowing of the block range on both the origin and destination chains. For the latter, only the destination chain is narrowed.
Some additional comments on why this approach was chosen:
Closes ACX-1767