Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SIMD-0204: Slashable event verification #204

Open
wants to merge 11 commits into
base: main
Choose a base branch
from
297 changes: 297 additions & 0 deletions proposals/0204-slashable-event-verification.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,297 @@
---
simd: '0204'
title: Slashable event verification
authors:
- Ashwin Sekar
category: Standard
type: Core
status: Review
created: 2024-11-26
feature: (fill in with feature tracking issues once accepted)
---

## Summary

This proposal describes an enshrined on-chain program to verify proofs that a
validator committed a slashable infraction. This program creates reports on chain
for use in future SIMDs.

**This proposal does not modify any stakes or rewards, the program will
only verify and log infractions.**

## Motivation

There exists a class of protocol violations that are difficult to detect synchronously,
but are simple to detect after the fact. In order to penalize violators we provide
a means to record these violations on chain.

This also serves as a starting point for observability and discussions around the
economics of penalizing these violators. This is a necessary step to implement
slashing in the Solana Protocol.

## New Terminology

None

### Feature flags

`create_slashing_program`:

- `sProgVaNWkYdP2eTRAy1CPrgb3b9p8yXCASrPEqo6VJ`

## Detailed Design

On the epoch boundary where the `create_slashing_program` feature flag is first
activated the following behavior will be executed in the first block for the new
epoch:
AshwinSekar marked this conversation as resolved.
Show resolved Hide resolved

1. Create a new program account at `S1ashing11111111111111111111111111111111111`
with an upgrade authority set to the system program
`11111111111111111111111111111111`

2. Verify that the program account
`8sT74BE7sanh4iT84EyVUL8b77cVruLHXGjvTyJ4GwCe` has a verified build hash of
`<FILL IN AFTER IMPLEMENTATION>` [\[1\]](#notes)
Comment on lines +52 to +54
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't typically do this when upgrading a program via feature gate, but I like it a lot! We should definitely start


3. Copy the contents of `8sT74BE7sanh4iT84EyVUL8b77cVruLHXGjvTyJ4GwCe` into
`S1ashing11111111111111111111111111111111111`
Comment on lines +56 to +57
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you intend for S1ashing11111111111111111111111111111111111 to be owned by the upgradeable loader or loader-v4? If it's the former, we should also move the program-data account from the old address to the PDA for S1ashing11111111111111111111111111111111111, to ensure that the program behaves like any other BPF program. cc @buffalojoec

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not familiar with the loaders so i'll defer to your wisdom :), just let me know which one makes the most sense and I can include it here.

Copy link
Contributor

@joncinque joncinque Dec 4, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's go with the default loader (bpf-loader-upgradeable) for now. We can amend it if loader-v4 becomes the default sooner than this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Specified the default loader and your comment about moving the program data account, let me know if it looks correct f4f8ab7


This program (hereafter referred to as the slashing program) supports 2
instructions `DuplicateBlockProof`, and `CloseProofReport`.

`DuplicateBlockProof` requires 1 account:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's call out that it also needs the instructions sysvar for signature verification


0. `proof_account`, expected to be previously intiialized with the proof data.
AshwinSekar marked this conversation as resolved.
Show resolved Hide resolved

`DuplicateBlockProof` has an instruction data of 48 bytes, containing:

- `0x00`, a fixed-value byte acting as the instruction discriminator
- `offset`, an unaligned eight-byte little-endian unsigned integer indicating
the offset from which to read the proof
- `slot`, an unaligned eight-byte little-endian unsigned integer indicating the
slot in which the violation occured
- `node_pubkey`, an unaligned 32 byte array representing the public key of the
node which committed the violation

We expect the contents of the `proof_account` when read from `offset` to
deserialize to a struct of two byte arrays representing the duplicate shreds.
AshwinSekar marked this conversation as resolved.
Show resolved Hide resolved
The first 4 bytes correspond to the length of the first shred, and the 4 bytes
after that shred correspond to the length of the second shred.
AshwinSekar marked this conversation as resolved.
Show resolved Hide resolved

```rust
struct DuplicateBlockProofData {
shred1_length: u32,
shred1: &[u8],
shred2_length: u32,
shred2: &[u8]
}
```

`DuplicateBlockProof` aborts if:

- The difference between the current slot and `slot` is greater than 1 epoch's
worth of slots as reported by the `Clock` sysvar
- `offset` is larger than the length of `proof_account`
- `proof_account[offset..]` does not deserialize cleanly to a
`DuplicateBlockProofData`.
- The resulting shreds do not adhere to the Solana shred format [\[2\]](#notes)
or are legacy shred variants.
- The resulting shreds specify a slot that is different from `slot`.
- The resulting shreds specify different shred versions.

After deserialization the slashing program will attempt to verify the proof, by
checking that `shred1` and `shred2` constitute a valid duplicate proof for
`slot` and are correctly signed by `node_pubkey`. This is similar to logic used
in Solana's gossip protocol to verify duplicate proofs for use in fork choice.
joncinque marked this conversation as resolved.
Show resolved Hide resolved

### Proof verification

`shred1` and `shred2` constitute a valid duplicate proof if any of the following
AshwinSekar marked this conversation as resolved.
Show resolved Hide resolved
conditions are met:

- Both shreds specify the same index and shred type, however their payloads
differ
- Both shreds specify the same FEC set, however their merkle roots differ
AshwinSekar marked this conversation as resolved.
Show resolved Hide resolved
- Both shreds specify the same FEC set and are coding shreds, however their
erasure configs conflict
- At least one shred is a coding shred, and its erasure meta indicates an FEC set
AshwinSekar marked this conversation as resolved.
Show resolved Hide resolved
overlap.
- The shreds are data shreds with different indices and the shred with the lower
index has the `LAST_SHRED_IN_SLOT` flag set

Note: We do not verify that `node_pubkey` was the leader for `slot`. Any node that
willingly signs duplicate shreds for a slot that they are not a leader for is
eligible for slashing.

---

### Signature verification

In order to verify that `shred1` and `shred2` were correctly signed by
`node_pubkey` we use instruction retrospection.
AshwinSekar marked this conversation as resolved.
Show resolved Hide resolved

Using the `Instructions` sysvar we verify that the previous two instructions of
this transaction are for the program ID
`Ed25519SigVerify111111111111111111111111111`

For each of these instructions, verify the instruction data:

- The first byte is `0x01`
- The second byte (padding) is `0x00`

And then deserialize the remaining instruction data as 2 byte little-endian
unsigned integers:

```rust
struct Ed25519SignatureOffsets {
signature_offset: u16, // offset to ed25519 signature of 64 bytes
signature_instruction_index: u16, // instruction index to find signature
public_key_offset: u16, // offset to public key of 32 bytes
public_key_instruction_index: u16, // instruction index to find public key
message_data_offset: u16, // offset to start of message data
message_data_size: u16, // size of message data
message_instruction_index: u16, // index of instruction data to get message
// data
}
```

We wish to verify that these instructions correspond to

```
verify(pubkey = node_pubkey, message = shred1.merkle_root, signature = shred1.signature)
verify(pubkey = node_pubkey, message = shred2.merkle_root, signature = shred2.signature)
```

We use the deserialized offsets to calculate [\[3\]](#notes) the `pubkey`,
`message`, and `signature` of each instruction and verify that they correspond
to the `node_pubkey`, `merkle_root`, and `signature` specified by the shred payload.

If both proof and signer verification succeed, we continue on to store the incident.

---

### Incident reporting

After verifying a successful proof we store the results in a program derived
address for future use. The PDA is derived using the `node_pubkey`, `slot`, and
the violation type:

```rust
let (pda, _) = find_program_address(&[
node_pubkey.to_bytes(),
AshwinSekar marked this conversation as resolved.
Show resolved Hide resolved
slot.to_le_bytes(),
AshwinSekar marked this conversation as resolved.
Show resolved Hide resolved
ViolationType::DuplicateBlock.to_u8(),
AshwinSekar marked this conversation as resolved.
Show resolved Hide resolved
])
```

At the moment `DuplicateBlock` is the only violation type but future work will
add additional slashing types.

If the `pda` account has non-zero lamports, then we abort as the violation has
already been reported. Otherwise we create the account, with the slashing program
AshwinSekar marked this conversation as resolved.
Show resolved Hide resolved
as the owner. In this account we store the following:

```rust
struct ProofReport {
reporter: Pubkey, // Fee payer, to allow the account to be closed
epoch: Epoch, // Epoch in which this report was created
AshwinSekar marked this conversation as resolved.
Show resolved Hide resolved
pubkey: Pubkey, // The pubkey of the node that committed the violation
slot: Slot, // Slot in which the violation occured
AshwinSekar marked this conversation as resolved.
Show resolved Hide resolved
violation_type: u8, // The violation type
proof: Vec<u8> // The serialized proof
AshwinSekar marked this conversation as resolved.
Show resolved Hide resolved
proof_account: Option<Pubkey>, // Optional account where proof is stored instead
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rather than using an Option, let's just always include this, with the caveat that the data at proof_account might get modified. And let's add it before proof to make it easier to find

}
```

The `DuplicateBlockProofData` is serialized into the `proof` field. This provides
an on chain trail of the reporting process, since the `proof_account` supplied in
the `DuplicateBlockProof` account could later be modified.

The `pubkey` is populated with the `node_pubkey`. For future violation types that
involve votes, this will instead be populated with the vote account's pubkey.
The work in SIMD-0180 will allow the `node_pubkey` to be translated to a vote account
if needed.

Note that PDA's can only be created with a 10kb initial size.
Although not a problem for `DuplicateBlockProofData`, if future proof types require
more space, we allow the proof to be stored in a separate account, and linked back
to the PDA using the `proof_account` field.
AshwinSekar marked this conversation as resolved.
Show resolved Hide resolved

---

### Closing the incident report

After the slashing violation has been processed by the runtime, the initial fee
AshwinSekar marked this conversation as resolved.
Show resolved Hide resolved
payer may wish to close their `ProofReport` account to reclaim the lamports.

They can accomplish this via the `CloseProofReport` instruction which requires
2 accounts:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds like it needs one more account -- the reporter to sign. But I wonder if we can allow this to be totally permissionless by having the reporter provide the lamport destination when they create the report. That way we're sure slash reports are always cleaned up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that approach, can definitely see people forgetting to close their reports

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we just send it back to the initial reporter? Or is it worth carrying around a separate address as the destination

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd keep them separate -- it gives extra flexibility and clarity at the cost of an extra 32 bytes, which shouldn't be too bad, especially considering the accounts will get cleaned up

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

separated it here f89a01b


0. `report_account`: The PDA account storing the report: Writable, owned by the
slashing program
1. `destination`: Writable account to reclaim the lamports

`CloseProofReport` has an instruction data of 42 bytes, containing:

- `0x01`, a fixed-value byte acting as the instruction discriminator
- `violation_type`, a one byte value acting as the violation type discriminator
- `slot`, an unaligned eight-byte little-endian unsigned integer indicating the
slot which was reported
- `pubkey`, an unaligned 32 byte array representing the public key of the node
which was reported
AshwinSekar marked this conversation as resolved.
Show resolved Hide resolved

We abort if:

- `violation_type` is not `0x00` (corresponds to `DuplicateBlock` violation)
- Deriving the pda using `pubkey`, `slot`, and `ViolationType::DuplicateBlock`
as outlined above does not result in the adddress of `report_account`
- `report_account` is not writeable
AshwinSekar marked this conversation as resolved.
Show resolved Hide resolved
- `report_account` does not deserialize cleanly to `ProofReport`
- `report_account.reporter` is not a signer
- `report_account.epoch + 3` is greater than the current epoch reported from
the `Clock` sysvar. We want to ensure that these accounts do not get closed before
they are observed by indexers and dashboards.
AshwinSekar marked this conversation as resolved.
Show resolved Hide resolved

Otherwise we close the `report_account` and credit the `lamports` to `destination`
AshwinSekar marked this conversation as resolved.
Show resolved Hide resolved

---

## Alternatives Considered

This proposal deploys the slashing program in an "enshrined" account, only upgradeable
through code changes in the validator software. Alternatively we could follow the
SPL program convention and deploy to an account upgradeable by a multisig. This
allows for more flexibility in the case of deploying hotfixes or rapid changes,
however allowing upgrade access to such a sensitive part of the system via a handful
of engineers poses a security risk.

## Impact

A new program will be enshrined at `S1ashing11111111111111111111111111111111111`.

Reports stored in PDAs of this program might be queried for dashboards which could
incur additional indexing overhead for RPC providers.

## Security Considerations

None

## Drawbacks

None

## Backwards Compatibility

The feature is not backwards compatible

## Notes

\[1\]: Sha256 of program data, see
https://github.com/Ellipsis-Labs/solana-verifiable-build/blob/214ba849946be0f7ec6a13d860f43afe125beea3/src/main.rs#L331
for details.

\[2\]: The slashing program will support any combination of merkle shreds, chained
merkle shreds, and retransmitter signed chained merkle shreds, see https://github.com/anza-xyz/agave/blob/4e7f7f76f453e126b171c800bbaca2cb28637535/ledger/src/shred.rs#L6
for the full specification.

\[3\]: Example of offset calculation can be found here https://docs.solanalabs.com/runtime/programs#ed25519-program
Loading