Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Turbine for Duplicate Block Prevention #71

Closed
117 changes: 117 additions & 0 deletions proposals/0057-duplicate-block-prevention.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,117 @@
---
simd: '0057'
title: Turbine for Duplicate Block Prevention
authors:
- Carl Lin
- Ashwin Sekar
category: Standard
type: Core
status: Draft
created: 2023-10-11
feature: (fill in with feature tracking issues once accepted)
---

## Summary

Duplicate block handling is slow and error prone when different validators see
different versions of the block

## Motivation

In a situation where a leader generates two different blocks for a slot, either:

1) all the validators get the same version of the block.
2) the super majority gets a mixture of shreds from the different versions of the
block and mark it dead during replay.
3) the network is split and participants get different replayable versions of the
block.

This proposal attempts to maximize the chance of situations (1) and (2).

## Alternatives Considered

Not applicable

## New Terminology

None, however this proposal assumes an understanding of shreds and turbine:
https://github.com/solana-foundation/specs/blob/main/p2p/shred.md
https://docs.solana.com/cluster/turbine-block-propagation

## Detailed Design

With the introduction of Merkle shreds, each shred is now uniquely attributable
to the FEC set to which it belongs. This means that given an FEC set of minimum
32 shreds, a leader cannot create an entirely new FEC set by just modifying the
last shred, because the `witness` in that last shred disambiguates which FEC set
it belongs to.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not very friendly to first-time readers, would be good to add SIMD or other doc here which describes the Merkle shred change introduced.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense i'll link it


This means that in order for a leader to force validators `A` and `B` to ingest
a separate version `N` and `N'` of a block, they must at a minimum create and
propagate two completely different versions of an FEC set. Given the smallest
FEC set of 32 shreds, this means that 32 shreds from one version must arrive to
validator `A`, and 32 completely different shreds from the other version must
arrive to validator `B`.

We aim to make this process as hard as possible by leveraging the randomness of
each shred's traversal through turbine via the following set of changes:

1. Lock down shred propagation so that validators only accept shred `X` if it
arrives from the correct ancestor in the turbine tree for that shred `X`. There
are a few downstream effects of this:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can lock down in our validator implementation. But if team X implements and allows their own sideline shred forwarder, how much of the assumption here is broken?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


- In repair, a validator `V` can no longer repair shred `X` from anybody other

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the only part of the proposal that gives me major heartburn.

Ignoring duplicate blocks for a second, we see quite a few cases where leader transmission to root of turbine tree drops for some period of time and we see lots of shreds drop in a row. In this case, we would need the various roots to request repair from the leader, then their children to request repair from them, etc. until everyone can repair the block. Seems like major latency in getting the block into blockstore so we can replay.

In other words, if you drop a shred near the top of the turbine tree, good luck getting your block confirmed. Obviously I haven't actually collected data to see if my assumptions are true, but I'll remain cautiously pessimistic for now.

One thing that might help is to enable retransmission of repaired shreds.

than the singular ancestor `Y` that was responsible for delivering shred `X` to
`V` in the turbine tree.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean a validator can only repair from its single parent on the Turbine tree, not grandparents?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should put that into the doc, something like "from anybody other than the parent (not even from grandparents) ...", ancestors include parents and grandparents, I think.

- Validators need to be able to repair erasure shreds, whereas they can only
repair data shreds today. This is because now the set of repair peers is locked,

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can remove the extra new line

then if validator `V`'s ancestor `Y` for shred `X` is down, then shred `X` is
unrecoverable. Without being able to repair a backup erasure shred, this would
mean validator `X` could never recover this block

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure whether this block belong to any bullet above, and what "then if" refers to. And not clear what "this" refers to in "this would".


2. If a validator received shred `S` for a block, and then another version of
that shred `S`' for the same block, it will propagate the witness of both of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After some offline discussion with behzad I think this is an unfeasible strategy in turbine. Sending payloads fragmented over more than one packet introduces a lot of overhead. It seems unwise to introduce this latency in turbine where performance is critical.

However for exact (slot, shred_index, type) duplicate proofs, this will continue to work out of the box like today.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to define what 'witness' means here

those shreds so that everyone in the turbine tree sees the duplicate proof. This
makes it harder for leaders to split the network into groups that see a block is
duplicate and groups that don't.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it guaranteed the Turbine tree is always the same if two validators with the same pubkey are physically apart (e.g. setting up hot-standby in US-Europe).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make the doc self-contained, we should probably list the properties of Turbine we depend on in the doc as well.


Note these duplicate proofs still need to gossiped because it's not guaranteed
duplicate shreds will propagate to everyone if there's a network partition, or
a colluding malicious root node in turbine. For instance, assuming 1 malicious
root node `X`, `X` can forward one version of the shred to one specific
validator `Y` only, and then only descendants of validator `Y` would possibly
see a duplicate proof when the other canonical version of the shred is
broadcasted.

3. The last FEC set is unique in that it can have less than 32 data shreds.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can update this to indicate the new strategy of ensuring fully packed FEC sets

In order to account for the last FEC set potentially having a 1:32 split of
data to coding shreds, we enforce that validators must see at least half the
block before voting on the block, *even if they received all the data shreds for
that block*. This guarantees leaders cannot just change the one data shred to

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: the one -> one?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

there is only 1 data shred in the mentioned example

generate two completely different, yet playable versions of the block

## Impact

The network will be more resilient against duplicate blocks

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we need to estimate memory/network impact for storing up to 5 duplicate forks?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

definitely, will be interesting to run inv tests with 5 partitions on turbine, but a connected gossip & repair. I believe that will allow us to propagate 5 blocks for each slot.
we can compare that to a cluster running only on repair to get a rough idea.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've heard mentions of storing up to 5 duplicate forks, but what does this actually mean? Does it mean plumbing blockstore to hold up to 5 versions of the same block and keying everything based on Slot Hash?

I don't see it mentioned anywhere in this SIMD, and it's not clear why it would actually be necessary as part of this proposal

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that was the original idea, however there has been no solid consensus on whether we need to implement such a change. Originally I had a section in this SIMD with that design carllin@807b5ee#diff-d1443f19931349d37d7a29462e1c96d99f6bd1a4d7b08757dd6360425ae15076L95, but since it is still uncertain I removed it.

I think the scope of this SIMD can be purely on efforts to prevent the propagation of duplicate blocks, and if necessary a later SIMD can speak about the new resolution efforts.


## Security Considerations

Not applicable

## Backwards Compatibility

Rollout will happen in stages, as this proposal depends on QUIC turbine

Tentative schedule:
Prevention:

1) Merkle shreds (rolled out)
2) Turbine/Repair features

- Coding shreds repair
- Propagate duplicate proofs through turbine
- 1/2 Shreds threshold for voting (feature flag)

3) QUIC turbine
4) Lock down turbine tree (feature flag and opt-out cli arg for shred forwarders)
Loading