-
Notifications
You must be signed in to change notification settings - Fork 15
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Collapse sparse AMTs #17
Comments
Collapsing is a bit hard when we don't store the index with the elements. We can't know without being able to slice an index into all of its components (3 bits per level for this width=8 impl) which particular entry we're at. When the height gets collapsed, we lose track of which pieces of an index we need to discard. If we stored index along with the value ( Another option is to inline-compact, maybe a rule where if branch only has a single element (or some other number), then don't link, inline it. So you'd end up with nodes that look like (conceptually): Costs everywhere so it'd depend on the nature of the data being stored whether these are worth it. |
Here's an algorithm which might work: Any height>0 node that only addresses a single terminal element within its sub-tree gets that element inlined in its parent along with the index of that element. So you'd end up with intermediate nodes that had both links to sub-trees as well as inlined values, (conceptually) like: So as you navigate down through the structure and you hit one of these things that's an array rather than a CID (so these become a kinded union of link and array), you just check the index against the one you're traversing to and that tells you whether this is the entry you care about or if that entry doesn't exist. The canonical form of the data structure would require compaction whenever a deletion created a situation where there was a leaf with only one element (accounting for that makes this a little awkward because you'd need to perform some traversal to discover the full count, but because all sub-trees are like this the traversal would be minimal). Then when you perform insertions and hit one of these compacted nodes you'd need to push it maximally downward until it's on its own or is at height=0. So there's some funky testing required to ensure that such an algorithm could remain stable and result in canonical form for any given set of data regardless of how you got there. |
I was thinking of something a bit simpler. Given a chain of degree one nodes:
Collapse the chain as follows:
By encoding "height +2" into node "d", we can avoid having to actually include nodes "b" and "c". |
But I think that it's differentiating from structures like this that's the problem with the simple compaction:
The whole index needs to contribute to locating the item, otherwise you can't be sure whether you addressed the right one.
An index |
Ah, I see. Yeah, you're right. You'd need something fancier
|
What if we just stored skipped bits in the node? |
Yeah, that might be nice and explicit, and potentially more space efficient too? So for loading an entry in an existing AMT it'd be something like: if we load a block and it has an optional "bits" field with a value, then interpret them as skipped bits in the index, otherwise just assume the normal height++ bit increment. |
Yeah, although we'd need to specify the number of bits. Alternatively, we could always include a prefix and a "mask". |
Currently, the height of an AMT is always
log(last_index)/log(8)
. Unfortunately, every lookup requiresheight
lookups. Ideally, we'd collapse degree-1 trees by allowing nodes to have a height greater than 1. The downside is an extra byte to record this height in each node.The text was updated successfully, but these errors were encountered: