More efficient state representation #4815
Replies: 7 comments 4 replies
-
@abacabadabacaba Thank you for the detailed explanation!
By "make it as large as twice" do you mean the attack where for key |
Beta Was this translation helpful? Give feedback.
-
This is a very nice idea @abacabadabacaba - especially the AVL guarantees in the worst case scenario. One thing that you didn't cover here (but you mentioned it during VC) - is how to optimise disk reads (if we do one read per 'level', we might end up being far slower than MPT - that has less levels). For (5) - split and join - what kind of splits are you thinking about? the ones related to dynamic sharding? Also - did you think about how the migration from MPT could look like? (I wonder how long would it take to move the current MPT-based storage into AVL - and whether we should do it in one go - that is during one epoch or do we have to stretch it for longer) What are the next steps with this idea? should we implement it and run some performance comparison with the current MPT ? |
Beta Was this translation helpful? Give feedback.
-
One note about using AVL trees for the state: Cosmos tried to do it and then they switched away from it. Relevant discussions: cosmos/cosmos-sdk#7100 and cosmos/cosmos-sdk#8297. In those discussions, I couldn't find any fundamental reasons which would make AVL trees unsuitable (or poorly suitable) for our purposes. So far, I don't think that their experience is a reason for us to not pursue the approach of using AVL trees for the state. |
Beta Was this translation helpful? Give feedback.
-
I'm curious what are other alternatives that were considered and how AVL tree would compare to them?
Separately would we change layout of the contract data vs account data as part of this change? e.g. currently contract data is stored with prefix of account. That indeed makes hashed keys complex to reshard but there are ways to address this by changing layouts. |
Beta Was this translation helpful? Give feedback.
-
@abacabadabacaba I think it would be a good idea to write list of mathematical properties the structure you propose has. Algorithm properties:
From my experience with debugging issues with implementation properties:
|
Beta Was this translation helpful? Give feedback.
-
@abacabadabacaba Anyway, I think that AVL tree is a valid solution. We should try to make a prototype, even not fully implemented, but enough to judge performance. To see, whenever that solution is good enough performance wise. I would be happy to help you with that. Share, any useful suggestions, etc. |
Beta Was this translation helpful? Give feedback.
-
@abacabadabacaba Calculate the merkle root hash on the AVL tree, whether it will be different according to the put order of keys? E.g [1,2,3,4,5] and [1.3.4.5.2], the merkle root hash of this two AVL tree will be same? |
Beta Was this translation helpful? Give feedback.
-
THIS IS A DRAFT, MORE INFO WILL BE ADDED LATER
See also: #4413.
Problem description
The data structure that is currently used to represent blockchain state, a variant of Merkle-Patricia Trie (MPT), is not very efficient. We want this structure to have the following properties:
MPT satisfies (1) to some extent, but not too well: node structure and algorithms are not very simple. There are possible simplifications (like merging extension nodes and branch nodes) which can improve this, but MPT has other, more fundamental problems.
For (2), the efficiently depends to a large extend on the depth on the tree. While the depth of MPT is usually small, it is possible for an adversary to make it as large as twice the length of the key in bytes, which is not good. If the keys are hashed, the impact of this attack is limited, but resharding becomes more difficult. Thus, (2) is not satisfied.
For (3), currently the gas fee for accessing a key depends on the number of nodes on its path, which in turn depends on the other keys as well. This makes the fee structure non-transparent for developers. Thus, (3) is not satisfied.
For (4), the worst proof size should be considered. For MPT, the worst case is when the path consists entirely of branch nodes, and each of them has a maximum number of children (16). In this case, each node will take at least 16×32=512 bytes just for the hashes (every second node can have one more hash if there is a key ending in that node). With a maximum of two nodes for each byte of the key, a proof for a single 50-byte key can take at least 50 KiB just for the intermediate nodes! Clearly, (4) is not satisfied.
MPT does satisfy (5) (if the keys are not hashes, see (2)).
Overall, MPT satisfies only two of the desired properties out of five, which is not enough.
Proposed solution: AVL tree
To solve these problems, I propose to represent the state as a variant of AVL tree. Here are some of the differences from MPT:
In the most common version of AVL trees, every node contains an entry. For our use case, this is suboptimal: if every node contains a key-value pair, then to prove the existence of some node it is necessary to provide keys and values for all nodes on the path from that node to the root. Even if those keys and values are hashed, the size of the proof is still much larger compared to the case where only leaf nodes contain entries. So, only leaf nodes contain entries, and internal nodes only include the hashes of their children.
Each internal node should also include its tilt (the difference between the heights of its subtrees). Because its absolute value cannot exceed one, there are only three possible values: left (left subtree is higher), right and center. This information is merkelized (that is, it contributes to the hash of the node) because it is needed to prove that the rotations are performed correctly. In addition to these values, each internal node is associated with a boundary key, which is the highest key in the left subtree. This is used by lookup algorithm to find an entry by its key. This value is not merkelized, instead, a different approach is used to prove that a lookup was performed correctly.
Simplified implementation
Beta Was this translation helpful? Give feedback.
All reactions