Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update comments and readme #4

Merged
merged 4 commits into from
Oct 23, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
41 changes: 15 additions & 26 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,32 +1,21 @@
# Certora Collections

Efficient functional collections for Kotlin.
Memory-efficient immutable collections for Kotlin.

TODO: need a better name
## Motivation

Certora Collections provides data structure implementations for Kotlin that are designed for efficient functional programming.
These data structures are immutable and persistent, and designed for minimal memory usage when used in a functional
style. They also include other features that we have found helpful in implementing the Certora Prover, such as efficient
joining/merging of maps, parallel operations, etc.
Kotlin makes it quite easy and natural to manipulate immutable data structures. However, the standard library does not
provide efficient implementations of immutable collections.
[kotlinx.collections.immutable](https://github.com/Kotlin/kotlinx.collections.immutable) provides a set of interfaces
that are designed to be implemented by efficient immutable collections, along with a reference implementation. However,
in developing the Certora Prover, we found that the reference implementation did not make the right performance
tradeoffs for our use cases.

We build on the interfaces introduced in
[kotlinx.collections.immutable](https://github.com/Kotlin/kotlinx.collections.immutable), but provide a different
underlying implementation based on "[treaps](https://en.wikipedia.org/wiki/Treap)," which are probabilistically balanced
binary search trees (BSTs).
Most `Set` and `Map` implementations, including the ones mentioned previously, are optimized primarily for speed of
operations on single elements of the collection, e.g., adding an element to a `Set` or looking up a single value in a
`Map`. However, in many use cases the more performance-critical operations are those that operate over the whole data
structure, such computing set unions or intersection of two sets, or merging two maps.

Currently we provide set and map implementations.

# Motivation





TODO: describe why we use treaps, what are the benefits, etc.


## Usage

...how to get the package in Gradle

## Docs
The Certora Collections library provides `Set` and `Map` implementations which are optimized primarily for such
operations. Further, we optimize heavily for memory usage over speed of operations on single elements. We also provide
some additional features that we have found useful in the Certora Prover, such as efficient parallel operations.
10 changes: 3 additions & 7 deletions collect/src/main/kotlin/com/certora/collect/Treap.kt
Original file line number Diff line number Diff line change
Expand Up @@ -36,9 +36,7 @@ package com.certora.collect
keys in Maps. For types that do not implement Comparable, we use the object's hashCode() function to generate a
Treap key. We thus sort all elements/keys by their hash codes. We resolve collisions by chaining off of the Treap
node. This results in a bifurcation of Set and Map implementations, into SortedTreapSet and HashTreapSet,
SortedTreapMap and HashTreapMap. To make this more manageable at use sites, we We provide AutoPersistentSet and
AutoPersistentMap, which are empty set/map objects that expand into the appropriate sorted/hashed structure based on
the actual key types encountered.
SortedTreapMap and HashTreapMap.

Some other notes:

Expand All @@ -55,10 +53,8 @@ package com.certora.collect
- To ensure "persistence" of these data structures, all instances of Treap are deeply immutable. "Mutation" is
always done by making a new instance. We go to lengths to avoid this, and also to throw out any copies if they
turn out to be unnecessary, so that we maximize our chances of reusing existing memory allocations, in the hope of
keeping memory small. Typically a union of two Sets, for example, will reuse a substantial portion of both sets.

- Most Treap operations are written as extension methods, so that they can deal with null "this" values. This makes
things much more convenient, but is a bit odd, and is another reason this stuff is isolated in its own package.
keeping memory small. Often a union of two large Sets, for example, will reuse a substantial portion of both
sets.

- Most Treap operations are defined recursively. This should not lead to stack overflow, as Treaps are very likely
to be mostly balanced. The expected (ideal) depth of a Treap is O(log2(N)), so we don't expect to see Treaps that
Expand Down
2 changes: 1 addition & 1 deletion collect/src/main/kotlin/com/certora/collect/TreapKey.kt
Original file line number Diff line number Diff line change
Expand Up @@ -62,7 +62,7 @@ internal interface TreapKey<@Treapable K> {
abstract fun precompute(): TreapKey<K>

/**
A TreapKey whose underlying key implement Comparable. This allows us to sort the Treap naturally.
A TreapKey whose underlying key implements Comparable. This allows us to sort the Treap naturally.
*/
interface Sorted<@Treapable K : Comparable<K>> : TreapKey<K> {
abstract override val treapKey: K
Expand Down
38 changes: 35 additions & 3 deletions collect/src/main/kotlin/com/certora/collect/TreapSet.kt
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@ package com.certora.collect
import kotlinx.collections.immutable.PersistentSet

/**
A [PersistentSet] implemented as a [Treap](https://en.wikipedia.org/wiki/Treap) - a kind of balanced binary tree.
A [PersistentSet] implemented as a [Treap](https://en.wikipedia.org/wiki/Treap) - a kind of balanced binary tree.
*/
@Treapable
public interface TreapSet<out T> : PersistentSet<T> {
Expand All @@ -16,7 +16,7 @@ public interface TreapSet<out T> : PersistentSet<T> {
override fun clear(): TreapSet<T>

/**
A [PersistentSet.Builder] that produces a [TreapSet].
A [PersistentSet.Builder] that produces a [TreapSet].
*/
public interface Builder<T> : PersistentSet.Builder<T> {
override fun build(): TreapSet<T>
Expand All @@ -25,16 +25,48 @@ public interface TreapSet<out T> : PersistentSet<T> {
@Suppress("Treapability")
override fun builder(): Builder<@UnsafeVariance T> = TreapSetBuilder(this)

/**
Checks if this set contains any of the given [elements]. This is equivalent to, but more efficient than,
`this.intersect(elements).isNotEmpty()`.
*/
public fun containsAny(elements: Iterable<@UnsafeVariance T>): Boolean

/**
If this set contains exactly one element, returns that element. Otherwise, throws [NoSuchElementException].
*/
public fun single(): T

/**
If this set contains exactly one element, returns that element. Otherwise, returns null.
*/
public fun singleOrNull(): T?

/**
If this set contains an element that compares equal to the specified [element], returns that element instance.
This is useful for implementing intern tables, for example.
*/
public fun findEqual(element: @UnsafeVariance T): T?

/**
Calls [action] for each element in the set. This traverses the treap without allocating temporary storage,
which may be more efficient than [forEach].
*/
public fun forEachElement(action: (element: T) -> Unit): Unit
}


/**
Returns an empty [TreapSet<T>]
*/
public fun <@Treapable T> treapSetOf(): TreapSet<T> = EmptyTreapSet()

/**
Returns a new [TreapSet<T>] containing the given element.
*/
public fun <@Treapable T> treapSetOf(element: T): TreapSet<T> = treapSetOf<T>().add(element)

/**
Returns a new [TreapSet<T>] containing the given elements.
*/
public fun <@Treapable T> treapSetOf(vararg elements: T): TreapSet<T> = treapSetOf<T>().mutate { it.addAll(elements) }

public fun <@Treapable T> treapSetOfNotNull(element: T?): TreapSet<T> = if (element != null) { treapSetOf(element) } else { treapSetOf() }
Expand Down
8 changes: 4 additions & 4 deletions collect/src/main/kotlin/com/certora/collect/Treapable.kt
Original file line number Diff line number Diff line change
Expand Up @@ -6,17 +6,17 @@ package com.certora.collect
The internal structure of Treap-based collections is based on the hash codes of the keys stored in the collection.
If a Treap is serialized and deserialized, the hash codes of the keys must be the same as they were before
serialization, or the Treap will be corrupted. To prevent this, we use [Treapable] annotations to convey hash code
stability requirements, and optionally enforce these requirements with the [Treapability] Detekt rule.
stability requirements, and optionally enforce these requirements with the [Treapability] Detekt rule.

A type parameter annotated with [Treapable] requires that all arguments to the type parameter have stable hash code
*if* they are also possibly serializable.
*if* they are also possibly serializable.

A type is definitely not serializable if it does not implement Serializable *and* it is final (or sealed, and all
subclasses are definitely not serializable).

A class, interface, or object annotated with [Treapable] must have a stable hash code (and all subclasses must
also). The [Treapability] rule will report a violation if a class, interface, or object annotated with [Treapable]
does not appear to have a stable hash code.
does not appear to have a stable hash code.

The [Treapability] rule will also report a violation if an argument to a [Treapable] type parameter is potentially
serializable and does not have a stable hash code. Some notes on hash code stability:
Expand All @@ -41,7 +41,7 @@ package com.certora.collect
and do not allow this to be overridden. A stable hash code can be obtained from the *name* of the enum instance.
(We could also use the ordinal, but this has an unnecessary dependence on ordering.)

- Our analysis considers some known JVM types to have stable hash codes (such as BigIngeter).
- Our analysis assumes some known JVM types to have stable hash codes (such as BigIngeter).
*/
@Target(AnnotationTarget.TYPE_PARAMETER, AnnotationTarget.TYPE, AnnotationTarget.CLASS)
public annotation class Treapable
Loading