Add support for augmented trees beyond simple counting #18

lorentey · 2016-10-15T15:25:34Z

BTree implements order-statistic trees, i.e., trees where each node maintains a count of all elements in the subtree under it. This allows offset-based access in logarithmic time, which is highly useful; e.g., we can't have a useful List without integer indices, and offset-based access also greatly enhances the usefulness of SortedSet and Map.

However, maintaining these element counts doesn't come free; they clearly have a performance cost. It would be nice if we had a BTree variant that wasn't augmented, if only to measure how much performance the augmentation costs. If this cost is too large, it might be worthwhile to provide an unaugmented BTree variant.

Furthermore, while counting is probably the most frequently useful tree augmentation, sometimes other statistics are needed. The code that implements the counting augmentation can be generalized to provide quick (log(n)) access to the result of any monoid operation over a sequence of tree elements. Some examples:

Counting: this is what we already have. Augmenting the tree with counts allows for efficient retrieval of elements at any offset, enables fast navigation inside the tree, etc.
```
count([]) = 0
count([_]) = 1
count(a + b) = count(a) + count(b)
```
Weight: The weight is a variant of counting where each element may have a different weight in the sum. For example, let's say the elements of the tree are themselves collections, and the tree is used to represent a concatenation of its elements. Augmenting the tree using a weight calculated from the sizes of each element enables efficient offset-based access to any element in the concatenated list.
```
weight([]) = 0, 
weight([e]) = e.weight, 
weight(a + b) = weight(a) + weight(b)
```
Maximum (or minimum) over some secondary key extracted from the elements. Beyond getting easy access to the element that has the largest/smallest such key, this also gives us an easy way to get the largest/smallest element to the left or right to any other element; this has interesting applications.
```
max([]) =  -infinity
max([e]) = key(e)
max(a + b) = max(max(a), max(b))
```
Any combinations of the above and more. We could support interval trees, well-separated point pair decompositions, etc.

Generalizing the existing BTreeNode to implement any such monoid-cached tree would be straightforward, but it would probably slow things down even more. So it'd probably be better to do support this in a separate implementation.

The text was updated successfully, but these errors were encountered:

lorentey added the enhancement label Nov 7, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for augmented trees beyond simple counting #18

Add support for augmented trees beyond simple counting #18

lorentey commented Oct 15, 2016

Add support for augmented trees beyond simple counting #18

Add support for augmented trees beyond simple counting #18

Comments

lorentey commented Oct 15, 2016