Skip to content
Kasey O edited this page Jul 30, 2017 · 3 revisions

Project description

The KaosCollections library contains classes that complement the System.Collections.Generic classes. The primary classes are:

  • RankedDictionary - an improved version of SortedDictionary.
  • RankedSet - an improved version of SortedSet.

All provided classes closely emulate the API of the Microsoft's classes while extending capability and improving performance in many common scenarios. Both these classes add array-like support so that any element can be accessed by index or the index of an element may be determined from the element's key.

The performance bump is due to the B+ tree variant used by KaosCollections. The structure used by SortedDictionary and SortedSet is a binary tree which greatly increases tree height and number of allocations compared to a B+ tree. Also in a B+ tree, data is grouped together in leaves which improves performance due to data locality.

API differences between System.Collections.Generic and Kaos.Collections

The API of RankedDictionary and RankedSet closely adheres to Microsoft's related generic classes with a few exceptions:

  • Both classes add the GetByIndex(index) method which returns the element at the supplied index.
  • Both classes add the IndexOf(key) method which returns the index of an element.
  • Both classes add the method GetBetween(lowerKey,upperKey) that returns an enumerator for elements beginning at the supplied value.
  • Both classes add the method GetFrom(lowerKey) that returns an enumerator for elements beginning at the supplied value.
  • RankedDictionary adds the method TryGetValueAndIndex(key, out value, out index) that extend IndexOf(key) by also retrieving the key's associated value and index in one call.
  • SortedSet method SortedSet.GetViewBetween (which provides a virtual view to a range of a SortedSet) is not implemented. Use the GetBetween(lower,upper) enumerator instead.
  • Neither class implements the ISerializable interface. Serialization may be added a future version.

Data structure

The data structure used by RankedDictionary and RankedSet is a variant of an order statistic B+ tree. An order statistic tree brings two additional capabilities to a tree:

  • Select(i) — find the i'th smallest element in the tree (i.e. retrieve key by index).
  • Rank(x) – find the rank of item x in the tree (i.e. retrieve index by key). The names of the classes in the KaosCollections library are derived from this operation.

RankedDictionary and RankedSet store all elements in leaves at the same depth. The leaf level is a sorted doubly linked list with head and tail pointers. The first key of every leaf (except the leftmost) is copied to one branch for subdividing. A tree with no elements is represented as an empty leaf.

This structure differs from a typical B+ tree in these ways:

  • While the root may contain as few as 2 children, other rightmost branches may contain as few as one child. This variation optimizes the structure for bulk loading of presorted data and improves seek performance for data near the end of the collection - both common operations. All other branches, besides the root, maintain at least 50% capacity usage following every add and remove operation.

  • The rightmost leaf may contain as few as one item. Again, this variation optimizes the structure for bulk loading of presorted data. All other leaves maintain at least 50% capacity usage following every add and remove operation.

  • Every branch tracks the total number of items (weight) in its descendent leaves. For example, the weight of the root is the total number of elements in all leaves.

Clone this wiki locally