Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for augmented trees beyond simple counting #18

Open
lorentey opened this issue Oct 15, 2016 · 0 comments
Open

Add support for augmented trees beyond simple counting #18

lorentey opened this issue Oct 15, 2016 · 0 comments

Comments

@lorentey
Copy link
Collaborator

BTree implements order-statistic trees, i.e., trees where each node maintains a count of all elements in the subtree under it. This allows offset-based access in logarithmic time, which is highly useful; e.g., we can't have a useful List without integer indices, and offset-based access also greatly enhances the usefulness of SortedSet and Map.

However, maintaining these element counts doesn't come free; they clearly have a performance cost. It would be nice if we had a BTree variant that wasn't augmented, if only to measure how much performance the augmentation costs. If this cost is too large, it might be worthwhile to provide an unaugmented BTree variant.

Furthermore, while counting is probably the most frequently useful tree augmentation, sometimes other statistics are needed. The code that implements the counting augmentation can be generalized to provide quick (log(n)) access to the result of any monoid operation over a sequence of tree elements. Some examples:

  • Counting: this is what we already have. Augmenting the tree with counts allows for efficient retrieval of elements at any offset, enables fast navigation inside the tree, etc.

    count([]) = 0
    count([_]) = 1
    count(a + b) = count(a) + count(b)
    
  • Weight: The weight is a variant of counting where each element may have a different weight in the sum. For example, let's say the elements of the tree are themselves collections, and the tree is used to represent a concatenation of its elements. Augmenting the tree using a weight calculated from the sizes of each element enables efficient offset-based access to any element in the concatenated list.

    weight([]) = 0, 
    weight([e]) = e.weight, 
    weight(a + b) = weight(a) + weight(b)
    
  • Maximum (or minimum) over some secondary key extracted from the elements. Beyond getting easy access to the element that has the largest/smallest such key, this also gives us an easy way to get the largest/smallest element to the left or right to any other element; this has interesting applications.

    max([]) =  -infinity
    max([e]) = key(e)
    max(a + b) = max(max(a), max(b))
    
  • Any combinations of the above and more. We could support interval trees, well-separated point pair decompositions, etc.

Generalizing the existing BTreeNode to implement any such monoid-cached tree would be straightforward, but it would probably slow things down even more. So it'd probably be better to do support this in a separate implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant