Skip to content
Bill Katz edited this page Feb 5, 2019 · 1 revision

(planned)

A petabyte-scale DVID-tuned datastore is in the planning stages. While building out DVID, it's clear that versioning allows us to keep the majority of the data within an immutable store.

Immutable stores are for write-once data and can achieve higher performance for less money because the store doesn't have to worry about mutating data. For example, an extremely compact, in-memory, static index can be generated. Also, caching and distributed transactions are simple. There is a natural match between mutable and immutable datastores and how DVID thinks of versioned data as a directed acyclic graph (DAG):

Versioning using DAG fits mutable/immutable stores

For connectomics research at Janelia (and the FlyEM research group in particular), the majority of data exists near the top of the DAG since most of our workflows involve ingesting very large image volumes and pre-generating segmentation for every voxel. So the bulk of our data can persist in immutable stores, allowing us to use smaller, faster storage solutions for the mutable portion of the DAG, namely the leaf nodes where manual editing tends to dominate.

The planned DVID store will combine in-memory ordered key-value indexing with version-aware append-only file storage, suitable for easy access of version-by-version deltas and rsyncing data. The in-memory ordered key-value indexing will be implemented using a compact data structure like the Fast Succinct Trie.

The goal of the DAGStore is to provide an open-source, simple solution for versioned data storage that (1) exploits immutability for performance and size, and (2) maximizes the efficiency of version data transfer between DAGStores.

Clone this wiki locally