-
Notifications
You must be signed in to change notification settings - Fork 486
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
incentives: cache top online accounts and use when building AbsentParticipationAccounts #6085
incentives: cache top online accounts and use when building AbsentParticipationAccounts #6085
Conversation
…ding AbsentParticipationAccounts
975ddb4
to
21db44d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
Just a thought - why not to init top online on startup and then maintain the list in acctonline while processing incoming blocks?
My first approach was to make it a field in the |
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## feature/heartbeats #6085 +/- ##
======================================================
+ Coverage 56.22% 56.27% +0.05%
======================================================
Files 494 494
Lines 69954 70040 +86
======================================================
+ Hits 39330 39416 +86
+ Misses 27947 27944 -3
- Partials 2677 2680 +3 ☔ View full report in Codecov by Sentry. |
…break TestAbsenteeChecks
Co-authored-by: John Jannotti <[email protected]>
f5b42d4
to
01b150a
Compare
1c4c898
to
c558d59
Compare
@@ -458,7 +458,7 @@ func TestOnlineAcctModelSimple(t *testing.T) { | |||
}) | |||
// test same scenario on double ledger | |||
t.Run("DoubleLedger", func(t *testing.T) { | |||
m := newDoubleLedgerAcctModel(t, protocol.ConsensusFuture, true) | |||
m := newDoubleLedgerAcctModel(t, protocol.ConsensusV39, true) // TODO simulate heartbeats |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not keep this on future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It fails because heartbeats aren't implemented, but proposers aren't being set, so the big accounts are challenged and kicked offline, and all the stake numbers don't match the test expectations. I could have tried to fix this by ensuring all the test accounts show up as proposers as often as necessary to avoid suspension, but I thought maybe it would be better to see after heartbeats were implemented whether that would make the tests pass without as much modification.
@@ -47,6 +47,7 @@ type roundCowParent interface { | |||
// lookup retrieves agreement data about an address, querying the ledger if necessary. | |||
lookupAgreement(basics.Address) (basics.OnlineAccountData, error) | |||
onlineStake() (basics.MicroAlgos, error) | |||
knockOfflineCandidates() (map[basics.Address]basics.OnlineAccountData, error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe a NIT: should we actually call this top online accounts or similar naming? It's very clear from comments that's what we are requesting, more a debate over if the name should be based on what it's sourced from vs the use-case we have for this atm.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a potentially stale list of top online accounts, if new accounts appeared online in the last 256 rounds (since the last state proof) they wouldn't appear. So the word "candidates" was intended to make it seem a little less definitive that this was the complete list of top online accounts for the round... but happy to pick any other name, I wasn't particularly happy with this name.
This is already being used in a method JJ called "generateKnockOfflineAccountsList" in #5757 which is where the "knockOffline" part came from.
@@ -810,6 +810,9 @@ func TestTotalWeightChanges(t *testing.T) { | |||
a := require.New(fixtures.SynchronizedTest(t)) | |||
|
|||
consensusParams := getDefaultStateProofConsensusParams() | |||
consensusParams.Payouts = config.ProposerPayoutRules{} // TODO re-enable payouts when nodes aren't suspended |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How are we tracking these?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess I should make an issue to address the "update this test once heartbeats are implemented" TODOs in this PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we will not have the top online cache before the first state proof, right? Maybe it would make sense to seed it during genesis (since the onlince accounts are listed out for us in the genesis file, I think). That could avoid special cases in the tests.
func (eval *BlockEvaluator) endOfBlock() error { | ||
// When generating a block, participating addresses are passed to prevent a | ||
// proposer from suspending itself. | ||
func (eval *BlockEvaluator) endOfBlock(participating ...basics.Address) error { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why ...basics.Address
instead of []basics.Address
? I assume callers always have a slice, as opposed to call sites with, say, 5 explicit arguments.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's true, this is just me optimizing for a smaller diff, to not change other endOfBlock callers, but the idea is to pass a slice — can change
IncentiveEligible bool // currently unused below, but may be needed in the future | ||
} | ||
candidates := make(map[basics.Address]candidateData) | ||
partAddrs := util.MakeSet(participating...) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we do anything else with this slice? Maybe we should push the Set
type up through the callers, so that it is built as a Set when it is first created to pass to endOfBlock
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's used in GenerateBlock while making a map of end-of-block account state for participating addresses, to include in the UnfinishedBlock
... if we pushed it up to GenerateBlock then it could protect against looking up the same participating address twice, if duplicate addresses were passed to GenerateBlock.
if maxSuspensions > 0 { | ||
knockOfflineCandidates, err := eval.state.knockOfflineCandidates() | ||
if err != nil { | ||
// Log an error and keep going; generating lists of absent and expired |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this implies some nodes can "choose" not to search for absent/expired accounts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, when generating a block it is not required they put any accounts in the {Absent,Expired}ParticipationAccounts block headers, but if they are in the list, validation rules require that the accounts are actually absent or expired.
|
||
// Now, check these candidate accounts to see if they are expired or absent. | ||
for accountAddr, acctData := range candidates { | ||
if acctData.MicroAlgosWithRewards.IsZero() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
100% of time, zero balance implies being closed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's correct, my understanding is currently the only way you can have a zero balance at the end of the round is if your account has been closed.
// | ||
// This function is passed a list of participating addresses so a node will not | ||
// propose a block that suspends or expires itself. | ||
func (eval *BlockEvaluator) generateKnockOfflineAccountsList(participating []basics.Address) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
participating is really "participating accounts excluding any I host"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
here, the "participating" argument is the accounts that the node hosts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm good in general, a few small comments.
blkEval = l.nextBlock(t) | ||
//require.Empty(t, vb.Block().ExpiredParticipationAccounts) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why is this added commented out?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test sets up a bunch of participating accounts that are separate from the ones that I'm interested in, and they do expire (before they didn't because they weren't noticed), but in a separate branch I was working on updating this
challenge := byte(0) | ||
for i := uint64(0); i < uint64(1210); i++ { // A bit past one grace period (200) past challenge at 1000. | ||
vb := l.endBlock(t, blkEval) | ||
for i := uint64(0); i < uint64(1200); i++ { // Just before first suspension at 1171 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this not go past first suspension - why 1200?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's based on the values are set for certain accounts initializing LastHeartbeat/LastProposed earlier in the test
} | ||
|
||
st := txn.Sign(keys[0]) | ||
err = eval.Transaction(st, transactions.ApplyData{}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why remove all of these eval.Transaction
calls?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you no longer need to send transactions to cause GenerateBlock/BlockEvaluator to "notice" an account is expired or not participating
} | ||
|
||
// fetch fresh data up to this round from online account cache. These accounts should all | ||
// be in cache, as long as proto.StateProofTopVoters < onlineAccountsCacheMaxSize. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This feels like a condition to call out in the consensus file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added TestOnlineAccountsCacheSizeBiggerThanStateProofTopVoters
eff5fb4
to
8b6c443
Compare
a843630
to
c558d59
Compare
Merging this into
|
@@ -1607,25 +1619,94 @@ type challenge struct { | |||
// deltas and testing if any of them needs to be reset/suspended. Expiration | |||
// takes precedence - if an account is expired, it should be knocked offline and | |||
// key material deleted. If it is only suspended, the key material will remain. | |||
func (eval *BlockEvaluator) generateKnockOfflineAccountsList() { | |||
// | |||
// Different ndoes may propose different list of addresses based on node state. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ndoes -> nodes
candidates[accountAddr] = candidateData{ | ||
VoteLastValid: acctData.VoteLastValid, | ||
VoteID: acctData.VoteID, | ||
Status: basics.Online, // from lookupOnlineAccountData, which only returns online accounts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess need a test to enforce knockOfflineCandidates -> lookupOnlineAccountData control flow
for addr := range voters.AddrToPos { | ||
data, err := l.acctsOnline.lookupOnlineAccountData(rnd, addr) | ||
if err != nil { | ||
continue // skip missing / not online accounts |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why would voters ever return non-online account?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the voters are only calculating Top N every 256 rounds, so if a lookup for the current round (for the cached addr from the last state proof interval) being requested is that the account was closed/deleted, you could hit an error here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I should add a comment and write a test exercising this case, realizing it is kind of complicated now after writing it out
Summary
In #5757 a mechanism was introduced to suspend "absentee" accounts that don't participate (by making a proposal, or heartbeat as in #5799), by adding a block header
AbsentParticipationAccounts
, similar toExpiredParticipationAccounts
.Currently, the list is generated by considering any account touched by a transaction in the current block, since this data is readily available at
endOfBlock()
. This PR adds a periodically-updated cache of top online accounts to the ledger, to find additional online accounts not mentioned in the current block.All of these tracked addresses will now be checked for absentee or expired status each round. To get a recent list of top online accounts, this PR uses recent work done by the votersTracker and state proof worker. (Every 256 rounds, the state proof system performs a TopOnlineAccounts query.) This adds access to the votersTracker to fetch the most recent list of top online addresses, and for each address looks up the latest round's data from the online account cache.
LastProposed and LastHeartbeat are added to the online accounts table's DB representation in this PR. This also fixes an issue introduced in #5965 where uses of ledgercore.OnlineAccountData (which didn't have LastHeartbeat/LastProposed fields) were replaced by basics.OnlineAccountData (which did) and ended up with those fields not being set in a couple of conversions from AccountData.
Test Plan
update test/e2e-go/features/incentives/suspension_test.go(TODO return later after heartbeats)