diff --git a/README.md b/README.md index 30b0ce7..13c537d 100644 --- a/README.md +++ b/README.md @@ -1,32 +1,21 @@ # Certora Collections -Efficient functional collections for Kotlin. +Memory-efficient immutable collections for Kotlin. -TODO: need a better name +## Motivation -Certora Collections provides data structure implementations for Kotlin that are designed for efficient functional programming. -These data structures are immutable and persistent, and designed for minimal memory usage when used in a functional -style. They also include other features that we have found helpful in implementing the Certora Prover, such as efficient -joining/merging of maps, parallel operations, etc. +Kotlin makes it quite easy and natural to manipulate immutable data structures. However, the standard library does not +provide efficient implementations of immutable collections. +[kotlinx.collections.immutable](https://github.com/Kotlin/kotlinx.collections.immutable) provides a set of interfaces +that are designed to be implemented by efficient immutable collections, along with a reference implementation. However, +in developing the Certora Prover, we found that the reference implementation did not make the right performance +tradeoffs for our use cases. -We build on the interfaces introduced in -[kotlinx.collections.immutable](https://github.com/Kotlin/kotlinx.collections.immutable), but provide a different -underlying implementation based on "[treaps](https://en.wikipedia.org/wiki/Treap)," which are probabilistically balanced -binary search trees (BSTs). +Most `Set` and `Map` implementations, including the ones mentioned previously, are optimized primarily for speed of +operations on single elements of the collection, e.g., adding an element to a `Set` or looking up a single value in a +`Map`. However, in many use cases the more performance-critical operations are those that operate over the whole data +structure, such computing set unions or intersection of two sets, or merging two maps. -Currently we provide set and map implementations. - -# Motivation - - - - - -TODO: describe why we use treaps, what are the benefits, etc. - - -## Usage - -...how to get the package in Gradle - -## Docs +The Certora Collections library provides `Set` and `Map` implementations which are optimized primarily for such +operations. Further, we optimize heavily for memory usage over speed of operations on single elements. We also provide +some additional features that we have found useful in the Certora Prover, such as efficient parallel operations. diff --git a/collect/src/main/kotlin/com/certora/collect/Treap.kt b/collect/src/main/kotlin/com/certora/collect/Treap.kt index c11f148..fd5fb08 100644 --- a/collect/src/main/kotlin/com/certora/collect/Treap.kt +++ b/collect/src/main/kotlin/com/certora/collect/Treap.kt @@ -36,9 +36,7 @@ package com.certora.collect keys in Maps. For types that do not implement Comparable, we use the object's hashCode() function to generate a Treap key. We thus sort all elements/keys by their hash codes. We resolve collisions by chaining off of the Treap node. This results in a bifurcation of Set and Map implementations, into SortedTreapSet and HashTreapSet, - SortedTreapMap and HashTreapMap. To make this more manageable at use sites, we We provide AutoPersistentSet and - AutoPersistentMap, which are empty set/map objects that expand into the appropriate sorted/hashed structure based on - the actual key types encountered. + SortedTreapMap and HashTreapMap. Some other notes: @@ -55,10 +53,8 @@ package com.certora.collect - To ensure "persistence" of these data structures, all instances of Treap are deeply immutable. "Mutation" is always done by making a new instance. We go to lengths to avoid this, and also to throw out any copies if they turn out to be unnecessary, so that we maximize our chances of reusing existing memory allocations, in the hope of - keeping memory small. Typically a union of two Sets, for example, will reuse a substantial portion of both sets. - - - Most Treap operations are written as extension methods, so that they can deal with null "this" values. This makes - things much more convenient, but is a bit odd, and is another reason this stuff is isolated in its own package. + keeping memory small. Often a union of two large Sets, for example, will reuse a substantial portion of both + sets. - Most Treap operations are defined recursively. This should not lead to stack overflow, as Treaps are very likely to be mostly balanced. The expected (ideal) depth of a Treap is O(log2(N)), so we don't expect to see Treaps that diff --git a/collect/src/main/kotlin/com/certora/collect/TreapKey.kt b/collect/src/main/kotlin/com/certora/collect/TreapKey.kt index c50ec28..2c501da 100644 --- a/collect/src/main/kotlin/com/certora/collect/TreapKey.kt +++ b/collect/src/main/kotlin/com/certora/collect/TreapKey.kt @@ -62,7 +62,7 @@ internal interface TreapKey<@Treapable K> { abstract fun precompute(): TreapKey /** - A TreapKey whose underlying key implement Comparable. This allows us to sort the Treap naturally. + A TreapKey whose underlying key implements Comparable. This allows us to sort the Treap naturally. */ interface Sorted<@Treapable K : Comparable> : TreapKey { abstract override val treapKey: K diff --git a/collect/src/main/kotlin/com/certora/collect/TreapSet.kt b/collect/src/main/kotlin/com/certora/collect/TreapSet.kt index 10a36ea..2b29340 100644 --- a/collect/src/main/kotlin/com/certora/collect/TreapSet.kt +++ b/collect/src/main/kotlin/com/certora/collect/TreapSet.kt @@ -3,7 +3,7 @@ package com.certora.collect import kotlinx.collections.immutable.PersistentSet /** - A [PersistentSet] implemented as a [Treap](https://en.wikipedia.org/wiki/Treap) - a kind of balanced binary tree. + A [PersistentSet] implemented as a [Treap](https://en.wikipedia.org/wiki/Treap) - a kind of balanced binary tree. */ @Treapable public interface TreapSet : PersistentSet { @@ -16,7 +16,7 @@ public interface TreapSet : PersistentSet { override fun clear(): TreapSet /** - A [PersistentSet.Builder] that produces a [TreapSet]. + A [PersistentSet.Builder] that produces a [TreapSet]. */ public interface Builder : PersistentSet.Builder { override fun build(): TreapSet @@ -25,16 +25,48 @@ public interface TreapSet : PersistentSet { @Suppress("Treapability") override fun builder(): Builder<@UnsafeVariance T> = TreapSetBuilder(this) + /** + Checks if this set contains any of the given [elements]. This is equivalent to, but more efficient than, + `this.intersect(elements).isNotEmpty()`. + */ public fun containsAny(elements: Iterable<@UnsafeVariance T>): Boolean + + /** + If this set contains exactly one element, returns that element. Otherwise, throws [NoSuchElementException]. + */ public fun single(): T + + /** + If this set contains exactly one element, returns that element. Otherwise, returns null. + */ public fun singleOrNull(): T? + + /** + If this set contains an element that compares equal to the specified [element], returns that element instance. + This is useful for implementing intern tables, for example. + */ public fun findEqual(element: @UnsafeVariance T): T? + + /** + Calls [action] for each element in the set. This traverses the treap without allocating temporary storage, + which may be more efficient than [forEach]. + */ public fun forEachElement(action: (element: T) -> Unit): Unit } - +/** + Returns an empty [TreapSet] + */ public fun <@Treapable T> treapSetOf(): TreapSet = EmptyTreapSet() + +/** + Returns a new [TreapSet] containing the given element. + */ public fun <@Treapable T> treapSetOf(element: T): TreapSet = treapSetOf().add(element) + +/** + Returns a new [TreapSet] containing the given elements. + */ public fun <@Treapable T> treapSetOf(vararg elements: T): TreapSet = treapSetOf().mutate { it.addAll(elements) } public fun <@Treapable T> treapSetOfNotNull(element: T?): TreapSet = if (element != null) { treapSetOf(element) } else { treapSetOf() } diff --git a/collect/src/main/kotlin/com/certora/collect/Treapable.kt b/collect/src/main/kotlin/com/certora/collect/Treapable.kt index d90785e..edb1ff8 100644 --- a/collect/src/main/kotlin/com/certora/collect/Treapable.kt +++ b/collect/src/main/kotlin/com/certora/collect/Treapable.kt @@ -6,17 +6,17 @@ package com.certora.collect The internal structure of Treap-based collections is based on the hash codes of the keys stored in the collection. If a Treap is serialized and deserialized, the hash codes of the keys must be the same as they were before serialization, or the Treap will be corrupted. To prevent this, we use [Treapable] annotations to convey hash code - stability requirements, and optionally enforce these requirements with the [Treapability] Detekt rule. + stability requirements, and optionally enforce these requirements with the [Treapability] Detekt rule. A type parameter annotated with [Treapable] requires that all arguments to the type parameter have stable hash code - *if* they are also possibly serializable. + *if* they are also possibly serializable. A type is definitely not serializable if it does not implement Serializable *and* it is final (or sealed, and all subclasses are definitely not serializable). A class, interface, or object annotated with [Treapable] must have a stable hash code (and all subclasses must also). The [Treapability] rule will report a violation if a class, interface, or object annotated with [Treapable] - does not appear to have a stable hash code. + does not appear to have a stable hash code. The [Treapability] rule will also report a violation if an argument to a [Treapable] type parameter is potentially serializable and does not have a stable hash code. Some notes on hash code stability: @@ -41,7 +41,7 @@ package com.certora.collect and do not allow this to be overridden. A stable hash code can be obtained from the *name* of the enum instance. (We could also use the ordinal, but this has an unnecessary dependence on ordering.) - - Our analysis considers some known JVM types to have stable hash codes (such as BigIngeter). + - Our analysis assumes some known JVM types to have stable hash codes (such as BigIngeter). */ @Target(AnnotationTarget.TYPE_PARAMETER, AnnotationTarget.TYPE, AnnotationTarget.CLASS) public annotation class Treapable