Approaches to upgrading blockchain state #190

abacabadabacaba · 2021-04-03T17:55:13Z

abacabadabacaba
Apr 3, 2021

This post, which was inspired by near/nearcore#4024, describes three different ways how blockchain state can be upgraded, together with their pros and cons.

What is a state upgrade?

State upgrade is an operation that changes how the state is represented. This usually involves changing the layout of some data structures, but it can be any change to the physical representation of the same logical state. This only includes the changes that have an effect on the consensus, therefore, for example, using a different (low-level) storage backend doesn't qualify.

Approach 1: lazy upgrade

Perhaps the most obvious approach is to have each state component, such as an account or storage entry, be individually versioned. After an upgrade starts, these components will be upgraded when they are accessed and/or modified. At each point, the state may include components of multiple different versions.

Pros:

Simple and easy to implement.

Cons:

Not suitable for all upgrades. An example of an upgrade that cannot be easily performed in this fashion is an upgrade that recalculates account storage.
Because some components may never be upgraded, the code needs to understand all versions indefinitely into the future.

Approach 2: parallel construction

This approach uses the fact that the state is small enough to sync within an epoch. If it is small enough to sync, it must be small enough to upgrade as well. So, the nodes keep maintaining the old state while building an upgraded copy in parallel. At some point, they begin posting the new state root hash to the blockchain instead of the old one. After that, they can discard the old state and use the new one exclusively.

Pros:

Pretty much any upgrade can be performed this way.
Looks nice. At each point, the entire state uses a single format.

Cons:

May be difficult to implement.
The block where the upgrade is performed cannot be challenged. What if a malicious chunk producer puts some random hash instead of the new state root? You can't make a compact proof that it is not actually correct.

Approach 3: sequential upgrade

This approach also maintains two states: the old one and the new one. Unlike in approach 2, the new state begins empty. Each block, some number of components are moved from the old state to the new state (that is, removed from the old state and at the same time added to the new state), and all the necessary conversions are performed. During the upgrade, there are two state roots. After the old state becomes empty, it is discarded and the upgrade is finished.

Pros:

Pretty much any upgrade can be performed this way.
After the upgrade is finished, the entire state uses a single format.
Blocks can be challenged.

Cons:

May be difficult to implement.

I think that in the future we should be using approach 3 for state upgrades. This means that individual components of the state (such as accounts) should not be versioned, instead, their version is determined by the specific state they appear in.

bowenwang1996 · 2021-04-05T15:56:19Z

bowenwang1996
Apr 5, 2021
Maintainer

the code needs to understand all versions indefinitely into the future.

I want to mention that today archival nodes already have to maintain all versions indefinitely.

0 replies

bowenwang1996 · 2021-05-15T00:08:34Z

bowenwang1996
May 15, 2021
Maintainer

I think we can use approach 3 to do resharding as well:

we need to modify account_id_to_shard_id to take epoch_id (or protocol_version). Let's say we want to reshard a shard according to some rule. For simplicity let's say there is a binary function f that tells us how to break the shard into two. Then we can do the following:

def account_id_to_shard_id(account_id, epoch_id):
    protocol_version = self.get_epoch_protocol_version(epoch_id)
    # do the old computation
    shard_id = compute_shard_id(account_id)
    if protocol_version > PROTOCOL_VERSION_TO_RESHARD and f(account_id) == 1:
         return total_num_shards
    return shard_id

Let's say epoch T is where the resharding happens. At the end of epoch T-2, we already know that in epoch T the number of shards will be different. For simplicity let's say there is only one more shard in epoch T. When we compute validators for epoch T, we can take into account the new shard and assign validators accordingly. Now in epoch T-1 the validators of the new shard in T will state sync the old shard and after that is done, using approach 3 to split the old shard into two shards while applying chunks. When epoch T comes, validators for the new shard will have everything ready to validate the shard.

0 replies

ilblackdragon · 2021-07-15T08:59:23Z

ilblackdragon
Jul 15, 2021
Maintainer

I haven't seen this approach discussed explicitly so wanted to ask what are problems with it?

Approach 4: Split & Merge

At any epoch boundary a single shard can be split into two or two "adjacent" shards can be joined into one.
The specific logic on where to split and when to merge is outside of this discussion.

Something simple would be split in half if shard had been using more than 1/3 of gas on average in the epoch and merge if two adjacent shards together have less than 1/3 of single shard gas limit

Adjacency of shards is defined by the ordering of the accounts in the Trie (or whatever storage we are using next). It would make sense to transition to use hash(account_id) as a key in the trie. Though as we saw grouping accounts close to each other is no very hard in the hash space as well.

Split

Let's say that at epoch T the shard_i will be split into 2: shard_i and shard_{i + 1}. This decision is made at the end of T-2 epoch together with chunk producer assignment for these two shards.

During T-1 these chunk producers of shard_i & shard_{i + 1} for T will be operating as chunk producers for shard_i before re-sharding. Meaning they are receiving chunks and applying them, maintaining full state of shard_i pre-split. This the logic we have now. It's important to track the full shard state because this same chunk producer can be chunk producer for this shard at T - 1.

When last chunk of T - 1 applied in shard_i, the "split" logic gets executed on all nodes that are tracking the shard_i:

Apply normally the transactions from the chunk.
Split the Trie into two Tries. Given we are splitting at the account level, this is should require a single walk down the trie along the boundary and recalculation of hashes. Because we use persistent data structure, this will just require creating two branch (or potentially extension) nodes on top of existing data in the storage. Also if we switch away from Trie to merkle tree this will become even easier.
Produce two ChunkExtra, recording them into shard_i and shard_{i + 1}

For non shard_i chunks at last chunk in T - 1: shift their ChunkResult by 1 if their id if after i.

When producing and applying the first block of T, the chunk producers selected for shard_i & shard_{i + 1} start operating in the new sharding layout. They accept transactions for respective shards, produce a chunk and when applying will look up respective ChunkExtra for the next shard layout.

Merge

Let's say we want to merge shard_i and shard_{i + 1} into shard_i starting epoch T. This decision is made at the end of epoch T - 2. Chunk producers are assigned for new shard_i in T. These chunk producers starting to track and catch up with shard_i and shard_{i + 1} in epoch T - 1.

During application of the last chunk in T - 1, everyone who tracks shard_i and shard_{i + 1} -- and this should be everyone who will be chunk producing in the next chunk:

Apply transactions in these two chunks separately.
Merge Tries: again this straightforward as the shards storage namespace should intersect. Which means that it will be creating a branch node that will include top branch nodes from both shards + potentially walking down the split path and repeating that.
Produce single ChunkExtra for shard_i

Other chunk producers shift down their ChunkExtra if their id is after i+1.

Producing chunks in T then operates as normal.

5 replies

ilblackdragon Jul 15, 2021
Maintainer

An interesting consideration is also around supporting intra-epoch split.

Given the relative simplicity of the split operation (e.g. application of extra logic after specific chunk application), this operation is not limited to the end of epoch.

This would allow to be more responsive to the spikes in the usage.

To facilitate this, a pool of potential chunk producers should exist that can help to take the load while they should be already synced into the shard.

This can be achieved by over-provisioning number of chunk producers into shards with expectation that even after split into two there are enough chunk producers in the each. Or they can be not chunk producers but still required to track the chunks of specific shards and participate in the production in case of shard split.

bowenwang1996 Jul 15, 2021
Maintainer

Because we use persistent data structure, this will just require creating two branch (or potentially extension) nodes on top of existing data in the storage

I don't think it is this simple. Contract state and code are stored under separate prefixes and we have to separate them as well.

bowenwang1996 Jul 15, 2021
Maintainer

Also I don't think this approach works when we have challenges

mzhangmzz Jul 15, 2021

Because we use persistent data structure, this will just require creating two branch (or potentially extension) nodes on top of existing data in the storage

I don't think it is this simple. Contract state and code are stored under separate prefixes and we have to separate them as well.

Actually then splitting the states requires splitting the eight sub-tries of Account, ContractCode, AccessKey, ReceivedData, PostponedReceiptId, PostponedDataCount, PostponeReceipt and ContractData, which requires walking down the trie eight times and split on the boundary account. And plus the time to split DelayedReceipts since they are not indexed by account_ids at all. If all this work can be done significantly less than 1s, then this approach would work.

ilblackdragon Jul 16, 2021
Maintainer

I agree that data layout wasn't taking into consideration in the original proposal. Especially the fact of using shard_id as a prefix for keys.

I also want to point out that ContractCode split is complex because there can be contracts in both shards that use the same key.

I think we need to consider storage layout first before discussing shard state split.

That conversation has moved to near/nearcore#4527 (comment), though it's still a protocol question.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Approaches to upgrading blockchain state #190

{{title}}

Replies: 3 comments 5 replies

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Approaches to upgrading blockchain state #190

abacabadabacaba Apr 3, 2021

What is a state upgrade?

Approach 1: lazy upgrade

Approach 2: parallel construction

Approach 3: sequential upgrade

Replies: 3 comments · 5 replies

bowenwang1996 Apr 5, 2021 Maintainer

bowenwang1996 May 15, 2021 Maintainer

ilblackdragon Jul 15, 2021 Maintainer

Approach 4: Split & Merge

Split

Merge

ilblackdragon Jul 15, 2021 Maintainer

bowenwang1996 Jul 15, 2021 Maintainer

bowenwang1996 Jul 15, 2021 Maintainer

mzhangmzz Jul 15, 2021

ilblackdragon Jul 16, 2021 Maintainer

abacabadabacaba
Apr 3, 2021

Replies: 3 comments 5 replies

bowenwang1996
Apr 5, 2021
Maintainer

bowenwang1996
May 15, 2021
Maintainer

ilblackdragon
Jul 15, 2021
Maintainer

ilblackdragon Jul 15, 2021
Maintainer

bowenwang1996 Jul 15, 2021
Maintainer

bowenwang1996 Jul 15, 2021
Maintainer

ilblackdragon Jul 16, 2021
Maintainer