Skip to content

Commit

Permalink
Adds Accounts Lattice Hash
Browse files Browse the repository at this point in the history
  • Loading branch information
brooksprumo committed Dec 20, 2024
1 parent 4ea7b71 commit 5578e27
Showing 1 changed file with 243 additions and 0 deletions.
243 changes: 243 additions & 0 deletions proposals/0215-accounts-lattice-hash.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,243 @@
---
simd: '0215'
title: Homomorphic Hashing of Account State
authors:
- Brooks Prumo
- Emanuele Cesena
- Josh Siegel
category: Standard
type: Core
status: Draft
created: 2024-12-20
feature: LtHaSHHsUge7EWTPVrmpuexKz6uVHZXZL6cgJa7W7Zn
development:
- Anza: Implemented
---

## Summary

This proposal adds a new hash, **Accounts Lattice Hash**, that uses homomorphic
hashing to maintain a hash of the total account state. The **Accounts Lattice
Hash** is both fast to update, and secure, enabling (1) every block to contain
the hash of all accounts, instead of only the accounts changed in that block,
and (2) removing the **Epoch Accounts Hash**.


## Motivation

The main goal is to scale Solana to billions accounts, and compute a "hash of
all accounts" in practical time and space.

Currently there are two main kinds of accounts hashes used in Solana:
1. The **Epoch Accounts Hash**, which is a merkle-based hash of the total

Check failure on line 32 in proposals/0215-accounts-lattice-hash.md

View workflow job for this annotation

GitHub Actions / Markdown Linter

Lists should be surrounded by blank lines [Context: "1. The **Epoch Accounts Hash**..."]

proposals/0215-accounts-lattice-hash.md:32 MD032/blanks-around-lists Lists should be surrounded by blank lines [Context: "1. The **Epoch Accounts Hash**..."]
account state.
2. The **Accounts Delta Hash**, which is also a merkle-based hash of accounts
written in a single block.

Both of these hashes have limitations. They calculate the root of a Merkle
tree of accounts, sorted by public key, which makes scaling challenging due to
the amount of data required. This is also why there are two distinct hashes:
the **Epoch Accounts Hash** hashes the every account, but is infrequent; and
the **Accounts Delta Hash** is every block, but only contains the subset of
accounts written in that block.

Ideally every block would contain a hash of the total account state.

Homomorphic hashing functions enable computing the hash of the total account
state by *incrementally* accumulating the hash of each account. This removes
the need to sort and maintain large intermediate state. Said another way, the
total account state hash for block `B` can now be computed from block `B - 1`'s
total account state hash plus *only* the accounts that changed in block `B`.


## New Terminology

**Accounts Lattice Hash** is the new "hash of all accounts", built using a
lattice-based homomorphic hashing function.

**Homomorphic hashing** "is the concept to solve the following problem: Given
the hash of an input, along with a small update to the input, how can you
compute the hash of the new input with its update applied, without having to
recompute the entire hash from scratch?" [1].

**Lattice hash** is the specific homomorphic hashing function chosen to compute
both an individual account hash, and the hash of all accounts.


## Detailed Design

### Defining LtHash

**LtHash** is the 2048-byte result of hashing an input with the lattice hashing function.

A hash function has interface: `init`, `append`, `fini`. A homomorphic hash
function has two additional methods: `add`, `sub`. Finally, it is convenient
to have a 32-byte result, so we introduce the method: `out`.

For LtHash:

`init()`: is blake3.init
`append(data)`: is blake3.append(data)
`fini()`: is blake3_xof.fini(2048), i.e. return a 2048-byte output using the
property that Blake3 is an eXtendable Output Function (XOF).
`add(a, b)`: interpret the hash values `a, b` as arrays of 1024 u16s.
Wrapping-add each pair of elements in the arrays.
`sub(a, b)`: interpret the hash values `a, b` as arrays of 1024 u16s.
Wrapping-sub each pair of elements in the arrays.
`out(a)`: blake3.fini( blake3.append( blake3.init(), a ) ), i.e. the 32-byte
blake3 of the 2048-byte data.


### Calculating LtHash of one account

The LtHash of a single account is defined as:

```
lthash(account) :=
if account.lamports == 0:
return 00..00
else:
lthash.init()
lthash.append( account.lamports )
lthash.append( account.data )
lthash.append( account.is_executable )
lthash.append( account.owner )
lthash.append( account.pubkey )
return lthash.fini()
```


### Calculating LtHash of all accounts, aka Accounts Lattice Hash

For a set of accounts `A`:

```math
LTHASH(A) = \sum_{a \in A} lthash(a)
```

Specifically, the Accounts Lattice Hash is the sum of the (single) account
lthash for all accounts.


### Incrementally updating Accounts Lattice Hash per block

For a given block, there are a set of accounts modified in that block. Let
`account` indicate the state of the account *before* modification, and
`account'` indicate the state of the account *after* modification.

To compute the accounts lattice hash for a block, `LTHASH'`, and given the
*prior* block's accounts lattice hash, `LTHASH`:

```
LTHASH' := LTHASH
for each account modified in B:
LTHASH'.sub( lthash( account ) )
LTHASH'.add( lthash( account' ) )
return LTHASH'
```

Notes:

- The order in which account lthashes are add/sub is irrelevant.
- If an account is modified by multiple transaction, it is possible to either
compute its lthashes before/after each individual transaction or the entire
block. The final result is the same.
- Every account modified in a block must be reflected in that block's Accounts
Lattice Hash. This applies to both on-chain and off-chain modifications.


### Changes to the Bank Hash

The bank hash for each block will now include the Accounts Lattice Hash. The
Accounts Lattice Hash will be incrementally updated and mixed in *instead of*
the Epoch Accounts Hash (and in every block, not just once per epoch). It is
safe to replace the Epoch Accounts Hash with the Accounts Lattice Hash since
they are both hashes of the total account state.

Note that this SIMD does *not* make any changes to the Accounts Delta Hash.


### Feature Activation

At feature activation, all nodes must have the Accounts Lattice Hash. If they
haven't already been computing/updating it, they must now do so.

It is recommended to provide a mechanism for nodes to opt-in to computing the
initial Accounts Lattice Hash and begin updating the value each block *before*
feature activation. This will enable the cluster to pre-compute the Accounts
Lattice Hash prior to feature activation and avoid a cluster-wide pause while
all nodes calculate the Accounts Lattice Hash.


### Snapshots

Since computing the initial Accounts Lattice Hash is expensive, we want to
avoid this computation when possible. The Accounts Lattice Hash will be
written to the snapshot, and will be read out at boot time (if present).

It is recommended for nodes to verify the Accounts Lattice Hash read from the
snapshot.

Note that this SIMD does *not* make any changes to the Snapshot Hash.


### Epoch Accounts Hash

As stated above, the Accounts Lattice Hash will replace the Epoch Accounts
Hash. Thus, the Epoch Accounts Hash will effectively be removed from consensus
at feature activation. The snapshot format will *not* change to *remove* the
Epoch Accounts Hash field.


## Alternatives Considered

The following alternatives were considered:

- Merkle Trees
- Pros: well known, can support inclusion proofs
- Cons: won't scale
- Verkle Trees
- Pros: scale better, support inclusion proofs
- Cons: not fast enough for Solana
- Incremental Hash based on XOR
- Insecure: known sub-exponential attacks exist [3, 4]
- Incremental Hash based on Elliptic Curve, e.g. Ristretto or GLS254
- Pros: secure incremental hash
- Cons: not as efficient as LtHash (10-100x slower)


## Impact

Only validators will be impacted.


## Security Considerations

LtHash instantiated with BLAKE3 and a 2048-byte output provides the desired
128-bit security [1, Appendix A].


## Drawbacks

The Accounts Lattice Hash does NOT support inclusion/exclusion proofs.


## Backwards Compatibility *(Optional)*

Incompatible. This changes the bank hash, thus changing consensus.


## Bibliography

1. *Lewi, Kim, Maykov, Weis*, **Securing Update Propagation with Homomorphic
Hashing**, 2019, [ia.cr/2019/227](https://ia.cr/2019/227)
2. *Bellare, Goldreich, Goldwasser*, **Incremental Cryptography: The Case of
Hashing and Signing**, 1994,
[cseweb.ucsd.edu](https://cseweb.ucsd.edu/~mihir/papers/inc1.pdf)
3. *Bellare, Micciancio*, **A new paradigm for collision-free hashing:
Incrementality at reduced cost**, 1996,
[cseweb.ucsd.edu](https://cseweb.ucsd.edu/~daniele/papers/IncHash.html)
4. *Wagner*, **A Generalized Birthday Problem**, 2002,
[crypto2002](https://www.iacr.org/archive/crypto2002/24420288/24420288.pdf)
5. *O'Connor, Ausmasson, Neves, Wilcox-O'Hearn*, **BLAKE3**, 2021,
[PDF](https://github.com/BLAKE3-team/BLAKE3-specs/blob/master/blake3.pdf)

0 comments on commit 5578e27

Please sign in to comment.