Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SIMD-0165: Async Vote Execution #165

Open
wants to merge 24 commits into
base: main
Choose a base branch
from

Conversation

wen-coding
Copy link

@wen-coding wen-coding commented Aug 12, 2024

Async Vote Execution SIMD:

  • Add a quick replay step which only replays votes without considering fee payers and generates ephemeral hash
    This step happens right after a block is received, the votes are used to update fork choices, and then a vote with the ephemeral hash is immediately sent out so others can update their fork choices as well.
  • The full replay as we know it still happens, just at a later time

@aeyakovenko
Copy link

High level seems right

Comment on lines 91 to 92
- `VoteTowerAccount`: tracks tower state and vote authority, it will be
in `VED`, it is updated by vote transactions and tracks vote credits.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A flow diagram/description of how users will pay for vote fees would be useful here. A step by step from vote account creation, to funding the account, to which domains it's created/moved into, etc.

@wen-coding wen-coding changed the title Initial draft of Async Vote Execution SIMD. Async Vote Execution SIMD. Sep 21, 2024
@wen-coding wen-coding changed the title Async Vote Execution SIMD. SIMD-165: Async Vote Execution Sep 26, 2024
@wen-coding wen-coding changed the title SIMD-165: Async Vote Execution SIMD-0165: Async Vote Execution Sep 26, 2024
failed.

A side benefit of this design is that the sysvars have all been calculated and
determined in the optimistical voting stage, also the fork of blocks are known,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: s/optimistical/optimistic

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

only the hash result of the vote transactions.

This is safe because the list of validators and their corresponding stake
has already been determined at the beginning of the Epoch. The stake we used

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The leader schedule for the current epoch was computed at the beginning of the previous epoch. Not this epoch.

The fork choice stake weights should match the LS computation, otherwise there will be a stall waiting for async to finish at the start of the epoch.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@aeyakovenko
Copy link

Async Program Execution: APE 🦧🦧🦧

@wen-coding
Copy link
Author

Async Program Execution: APE 🦧🦧🦧

The intention is limit the scope of this SIMD to just the vote part, because we have several SIMDs for the non-vote part.

Copy link
Contributor

@AshwinSekar AshwinSekar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high level looks good, just needs some more specification


Currently the vote transactions and non-vote transactions are mixed together in
a block, the vote transactions are only processed in consensus when the whole
block has been frozen and all transactions in the block have been verified and
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: votes are processed per entry batch, doesn't have to wait for the bank to be frozen. as a result we still partially process votes even if the block later ends up dead.
e.g. agave's impl https://github.com/anza-xyz/agave/blob/1f06fbdbe3b72f330f50ad93a15c1116f5021392/ledger/src/blockstore_processor.rs#L177

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, now I think about it, it's less about processing votes as fast as possible, more about a lot of things don't happen until the block is frozen, changes this part.

The `hash` and `slot` in the `TowerSync` transaction will be updated to
the vote only hash. The vote only hash is calculated as follows:

1. Sort all vote accounts with non-zero stake in the current epoch by
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

non-zero in the previous epoch right? since stakes are offset by 1 epoch?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I meant to say: has power to vote in the current or previous epoch. I think I need previous epoch as well for the epoch handoff?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what I meant is that consensus uses the stakes from the previous epoch, so if we're only interested in processing consensus votes we should match the logic here. Basically only consider vote accounts with non-zero stake in the previous epoch

If we ever implement the handoff then we should consider the previous previous epoch as well

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should invent a name for this thing, this is so confusing. I think you and me both mean "that epoch stakes we use for voting in this Epoch", right? Then let's say in the future we decide to use epoch stakes of Epoch X for voting in Epoch X+2 we don't need to chase down all documents again and change it from "previous epoch" to "previous previous epoch". We just stick to one name.

I'm confused: why don't we need stakes for the previous epoch (previous previous epoch in your description) if we don't implement handoff?

1. Sort all vote accounts with non-zero stake in the current epoch by
vote account pubkey.

2. Calculate vote account hash by hashing (vote account pubkey,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: specify the hash fn

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specified sha256.

vote account pubkey.

2. Calculate vote account hash by hashing (vote account pubkey,
serialized vote state) in the order given.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we have to hash the whole vote state? only the tower / root will change

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A bit worried the vote authority isn't copied correctly in some implementation, not a lot else other than the tower/root I guess.

4. Invalid transaction

For the first two, the same check can be performed computing ephemeral hash.
We will set root on a bank only when it has full hash computed later, so the
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does "set root" mean in this context?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is SetRootError in BlockstoreProcessorError. I think we will keep this check in the new world.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it doesn't mark the block dead though, just kills the replay thread.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm that's true, removed this check

To guarantee the blockchain will halt when full replay is having problems, we
propose:

1. If full replay is behind vote only replay by more than 1/2 epoch or vice
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re: "vice versa" can full replay run before vote only replay? I think we should ensure that full replay cannot run on a block that has not been vote only replayed

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no reason a full replay can't start before the bank is vote only frozen, they write into completely different set of states.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense, we should outline the interaction here:

Once a validator has determined the fork it will vote on, it can prioritize
replaying blocks on the selected fork

maybe point out that if we haven't completed vote replay we replay all available forks equally. also outline what happens in the case that we're full replaying a fork but end up vote replaying and voting on a different fork.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think what you described is reasonable, but I prefer not to specify in what order the full replay should happen in the protocol. Anza and Jump may decide to use different fork weight algorithm to pick which block to send to full replay next, I think the protocol would still work. And from what I heard, playing multiple forks at the same time or playing into the distant future is also a possibility.

3. The blocks on the selected forks are scheduled to be replayed. When
a block is replayed, all transactions are executed with fee payers checked.
This is the same as the replay we use today.
4. Optimisticly confirmed or finalized on `Vote only bankhash` and `Replay tip
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might want to specify the listener changes here, we expect clients to track commitment statuses on both hashes? What does it mean to be OC/finalized on the replay tip bankhash, will invalid fee payer votes count towards this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also specify changes to the commitment service, do we still aggregate when we vote on the block / root a block? or is there a separate pathway taken only when we full replay the block.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added to User visible changes.

6. Add alerts if `Replay tip bankhash` differs when the `Vote only bankhash` is
the same. This is potentially an event worthy of cluster restart. If more than
1/3 of the validators claim a different `Replay tip bankhash`, halt and exit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specify how the feature program will look with APE:

  • Normal pathway features will work same as before
  • APE features will require 2 epochs of advance activation

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added to user visible changes.

6. Add alerts if `Replay tip bankhash` differs when the `Vote only bankhash` is
the same. This is potentially an event worthy of cluster restart. If more than
1/3 of the validators claim a different `Replay tip bankhash`, halt and exit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specify the modifications to fork choice. Fork choice will now only read vote only replayed votes. Needs to be keyed by block-id or vote only hash instead.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's covered in "The result is immediately applied in consensus to select a fork." I think whether we key our internal data structure by block-id or hash is implementation details we don't need in SIMD. I've added that to point 2 in "Enable Async Vote Executions".

6. Add alerts if `Replay tip bankhash` differs when the `Vote only bankhash` is
the same. This is potentially an event worthy of cluster restart. If more than
1/3 of the validators claim a different `Replay tip bankhash`, halt and exit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specify the modifications to repair, we will now ingest vote only replayed votes in repair weighting. Also changes to repair peer selection, is EpochSlots updated after vote only replay now?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

EpochSlots just specified which slots I have to serve repair right? I don't see why we can't update it after vote only replay.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think repair weighting is also internal choice Anza and Jump may have different choices on, we surely should do what you suggested but we don't need it in the SIMD if it's not visible on the line.
Added that we will add banks to EpochSlots after vote only replay.

behavior will be the same as now.
For the first three, the same check can be performed computing vote only
hash. We will add a new check that the new root must be vote only replayed
and fully replayed, this may mean the tower has more than 32 slots
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you explain this part? We avoid setting the root if the block has not been fully replayed, so the tower can contain thousands of slots?
Why can't we set the root like we do today and modify pruning to keep up to the latest fully replayed root? Essentially moving the smr to prune by to be the replayed-smr

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to save the slots which should be kicked out of tower but haven't been fully replayed somewhere. We can only save snapshot on a fully replayed slot, and we need to keep ancestor relationships between blocks. It could be somewhere not on the tower, are you mostly worried about the space cost if we keep it on the tower?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we have all that in bank forks right? the tower is just a list of slot #s

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We do. I think the main question is whether we want it in vote transactions. We do want replay tip in there, but maybe not the full list of slots.

@wen-coding wen-coding self-assigned this Dec 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants