diff --git a/README.md b/README.md index 20e7bf116..c069a437b 100644 --- a/README.md +++ b/README.md @@ -9,7 +9,7 @@ scratch, including: * ACID-compliant transaction engine with MVCC-based snapshot isolation. -* Pluggable storage engine with B+tree and log-structured backends. +* Pluggable storage engine with BitCask and in-memory backends. * Iterator-based query engine with heuristic optimization and time-travel support. diff --git a/config/toydb.yaml b/config/toydb.yaml index 3a065bd36..801e9ebb2 100644 --- a/config/toydb.yaml +++ b/config/toydb.yaml @@ -25,7 +25,7 @@ sync: true storage_raft: hybrid # SQL key-value storage engine -# - bitcask: uses BitCask, an append-only log-structure store. -# - memory: (default) uses an in-memory B+tree. Durability is provided by the Raft log. +# - bitcask (default): uses BitCask, an append-only log-structure store. +# - memory: uses an in-memory B+tree. Durability is provided by the Raft log. # - stdmemory: uses the Rust standard library BTreeMap. -storage_sql: memory +storage_sql: bitcask diff --git a/docs/architecture.md b/docs/architecture.md index f9d3604b7..bd7195c68 100644 --- a/docs/architecture.md +++ b/docs/architecture.md @@ -129,33 +129,21 @@ encoding: * `sql::Value`: As above, with type prefix `0x00`=`Null`, `0x01`=`Boolean`, `0x02`=`Float`, `0x03`=`Integer`, `0x04`=`String` -The default key/value store is -[`storage::kv::Memory`](https://github.com/erikgrinaker/toydb/blob/master/src/storage/kv/memory.rs). -This is an in-memory [B+tree](https://en.wikipedia.org/wiki/B%2B_tree), a search tree -variant with multiple keys per node (to make use of cache locality) and values only in leaf nodes. -As key/value pairs are added and removed, tree nodes are split, merged, and rotated to keep them -balanced and at least half-full. - -Although key/value data is stored in memory, toyDB provides durability via the Raft log which -is persisted to disk. On startup, the Raft log is replayed to populate the in-memory store. +The default key/value store is a simple variant of +[`storage::kv::BitCask`](https://github.com/erikgrinaker/toydb/blob/master/src/storage/kv/bitcask.rs), +an append-only log-structured storage engine. All writes are appended to a log +file, with an index mapping live keys to file positions maintained in memory. +When the amount of garbage (replaced or deleted keys) in the file exceeds 20%, a +new log file is written containing only live keys, replacing the old log file. #### Key/Value Tradeoffs -**In-memory storage:** storing key/value data in memory has much better performance and is -simpler to implement than on-disk storage, but requires that the data set fits in memory. -Replaying the Raft log on startup can also take considerable time for large data sets. However, -as toyDB datasets are expected to be small, this is mostly advantageous. - -**Byte serialization:** since the primary storage is in memory, (de)serializing key/value -pairs adds significant unnecessary overhead. However, at the outset it was not clear that toyDB -would use in-memory storage, and byte slices is a simple interface that can be used regardless -of storage medium. +**Keyset in memory:** BitCask requires the entire key set to fit in memory, and must also scan +the log file on startup to construct the key index. -**B+tree scans:** B+trees often have pointers between neighboring leaf nodes for more efficient -range scans, but toyDB's implementation does not. This would complicate the implementation, and -the performance benefits are usually not as great in memory where random access latency is low. -However, this along with other implementation details cause range scans to be O(log n) rather -than O(1) per step. +**Compaction volume:** unlike an LSM tree, this single-file BitCask +implementation requires rewriting the entire dataset during compactions, which +can produce significant write amplification over time. **Key encoding:** does not make use of any compression, e.g. variable-length integers, preferring simplicity and correctness. @@ -294,8 +282,7 @@ problem, and it avoid having to do additional (possibly random) disk IO, greatly performance. **Garbage collection:** there is no garbage collection of old log entries, so the log will grow -without bound. However, this is a necessity since the the default toyDB configuration uses -in-memory key/value storage by default and there is no other durable storage. +without bound. ## Raft Consensus Engine diff --git a/docs/examples.md b/docs/examples.md index aeb77bbcf..c2d520e11 100644 --- a/docs/examples.md +++ b/docs/examples.md @@ -40,7 +40,7 @@ toydb> !status Server: toydb-e (leader toydb-d in term 1 with 5 nodes) Raft log: 1 committed, 0 applied, 0.000 MB (hybrid storage) Node logs: toydb-a:1 toydb-b:1 toydb-c:1 toydb-d:1 toydb-e:1 -SQL txns: 0 active, 0 total (memory storage) +SQL txns: 0 active, 0 total (bitcask storage) ``` The cluster is shut down by pressing Ctrl-C. Data is saved under `clusters/local/toydb-?/data/`,