Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SIMD 0072: Feature Gate Threshold Automation #72
base: main
Are you sure you want to change the base?
SIMD 0072: Feature Gate Threshold Automation #72
Changes from 22 commits
013d0e7
9b601c1
24bc2c5
fa1f218
4a65303
a7d6a49
843c83e
e513e82
d705e54
a63abbb
7a6c3fc
23f1b24
f830338
2fb901b
ab95740
1a8f21a
ea129ba
bb1c797
b85755d
66e6371
58cb436
da132f7
4de5abd
072344f
9296061
458b3ac
d9f399d
b87cadb
6e74a91
4591f55
File filter
Filter by extension
Conversations
Jump to
There are no files selected for viewing
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this also need to include the epoch? As the list of staged features changes epoch-to-epoch, we need a way of ensuring that the bit mask in each validator support signal transaction is interpreted for the correct epoch. If the transaction is delayed or is executed in a different epoch than the one it was sent, for whatever reason, then the support signal will be mis-interpreted.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would guess the
Staged Features PDA
supplied should indicate the intended epoch?There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's in the PDA for the validator's support signal. When they send a bitmask, it gets recorded in their signal PDA, and if they send again, it's overwritten. The current epoch is checked via
Clock
sysvar when the program is invoked with this instruction.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need any deadline/restriction? I see the program cache being alluded to, but I don't see any specification that changes to the Staged Features PDA must begin or conclude at some point earlier than the epoch boundary. It's not obvious why the Feature Gate program can't continue to modify the Staged Features PDA up to the epoch boundary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you suggesting to explicitly explain the reasoning for this constraint in the SIMD, or are you asking why we actually need such a constraint? In either case, I agree it should be more explicit in the proposal.
It seems to me that @Lichtso intends to increase the cache preparation phase's initiation to be sooner in the epoch (comment), so we probably need a more concrete explanation as to why the Feature Gate program should have this constraint.
In this proposal, it was initially set to 128 slots remaining in the epoch because the cache preparation stage begins at a maximum of 128 slots before the end of the epoch, as computed here.
It's worth calling out that if we configure this constraint in the Feature Gate program and then later need to adjust it for an earlier cache preparation start, we'll be bound to using a feature gate for that change to the program.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm asking why we need a constraint at all. The cache being referenced is supposedly an agave-specific implementation, not part of the runtime protocol. If feature activation needs to work around any aspect of cache behavior, that seems like (a) a problem; and (b) a thing that definitely needs to be specified for all clients.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's really nothing stopping the cache from being able to anticipate the upcoming feature environment by querying the PDA under the Feature Gate program that would tell it what the current stake support is for each feature. It's actually arguably equally or more deterministic than the way it's done now.
The cache could run the calculation at its current 128 slots before epoch boundary and probably get a pretty good understanding of which features have enough stake support. Increasing that slot buffer beyond 128 could just mean increasing the likelihood that not enough stake has been signaled for a feature. Perhaps this is a compromise the cache implementation should consider? After all, I believe recompilation is an optimization.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a commit removing the "slots remaining" constraint, since I think the above is a sensible direction to head in. If there's explicit opposition, it can be added back in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for my imprecise formulation, I meant scheduled for activation.
If a feature is scheduled for activation in some slot close to the boundary, then it is possible that the TX is not yet rooted. Thus, some validator nodes will not have it scheduled, and will not activate it on the boundary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please explain how this is different from the cluster coming to consensus on any other tx/block on which other logic depends.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For features which only affect their own fork this is irrelevant. But for features which change validator global behavior this would break. And I am not sure if there is much to be gained from making exceptions or harving different feature types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just adding an additional point of clarity here, if it helps.
Right now, in Agave, the cache is calling
compute_active_feature_set
, which will compute the feature set based on the on-chain accounts that have been created underFeature11111...
(ie. "activated"). The cache does this ~128 slots before the epoch boundary to prepare the cache entries for the anticipated upcoming environment.With the proposed system in place, the cache can still call
compute_active_feature_set
the same way it does now. The function will just compute the anticipated feature set from the proposed PDA (with stake support) rather than just the presence of on-chain accounts.In almost every way it's the same as what we have now. The subtle difference is that someone can no longer activate a feature after the cache has done its feature-set computation, but nodes could still cast votes (signals) on a feature, bringing it to "eligible" status. This edge case exists today and just causes cache preparation/recompilation to be moot.
What @CriesofCarrots is suggesting is that we don't explicitly architect this mechanism around the cache, which is sensible, especially considering the cache preparation may change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@Lichtso do you still have concerns here? I'd like to move this SIMD along.