Skip to content

Local‐First Sync Engine

Raine Revere edited this page Aug 20, 2024 · 9 revisions

em is designed to be local-first, allowing the user to work fully offline, without giving up collaborative editing and multi-device use when they are online. As it turns out, this is a harder problem than one might expect! After failed attempts with Firebase and YJS, it's time to take a step back and evaluate the available options based on lessons learned and a clarification of requirements. There have been fundamental advancements in CRDT and related technology since 2018, when the project started. A new movement has been catalyzed to solve the challenges of local-first.

This document describes the requirements for local-first functionality in em, and contains notes on several existing local-first sync engines. The goal is to identify the sync engine that can best support the desired functionality, and craft a plan to implement it.

Background

Read an introduction to the philosophy of local-first software.

Requirements

1. Replication

  • Full replica on every device
    • Not just a cache.
  • Partial replication
    • Share a subtree for collaborative editing.
  • Replication progress
    • Provide user with % completed.

2. Consistency

  • Maintain the same data across clients (eventual consistency).
  • Update multiple docs atomically (transactions).
    • Or have a way to repair the tree if some updates fail.
  • Conflict resolution: Unobtrusively handle conflicts if two nodes are edited concurrently.

3. Validity

  • Maintain doubly linked parent-child connection.

  • One parent per node.

  • No cycles.

  • Prevent subtrees becoming orphaned and inaccessible from the root node.

  • Preserve ancestors (See: Deleted parent example below)

    Details In the example, Client A and Client B are assumed to both be offline when they do the addition and delete, respectively. The child created by Client A should never be deleted indirectly as a consequence of Client B deleting one of its ancestors. Otherwise Client A could go to the trouble of adding an entire subtree only for it to get deleted accidentally by Client B.

    A consequence of this, I believe, is that a thought must be explicitly deleted, never deleted as a byproduct of its parent being deleted. If a user wants to delete a subtree of 10 thoughts, the system need to issue 10 deletes. Thoughts are explicitly deleted by the system, but the UI still offers the convenience of deleting all known descendants. When the user deletes a thought, we can lazily iterate the subtree and delete all descendants in the background.

    Note that a thought may be a leaf on one client, but a parent on a different client before they have synced up. So we can never know for sure if a thought is a leaf on all clients. There may always be descendants from other clients that just haven't synced yet.

    One idea I had was to automatically un-tombstone all ancestors whenever a thought is added or moved. That ensures it stays connected to the tree in the face of concurrent deletes. I haven't thought through how to get this to converge though.

    The downside of this approach is the re-appearance of a parent after its deletion. In order to be conservative with user data and avoid thoughts being deleted unintentionally by other clients, there will sometimes be cases where you have to re-delete a thought that you already deleted. This seems like a small price to pay for ensuring that your new thoughts are never deleted when you go back online.

Examples

Merged children
  • Client A: Adds child a to x.
  • Client B: Adds child b into x.
  • When synced, parent x needs to contain both a and b.
Divergent move
  • Client A moves x into a.
    • x.parentId = a
    • a.children = [x, ...]
  • Client B moves x into b.
    • x.parentId = b
    • b.children = [x, ...]
  • Either operation is valid, as long as the parentId and children remain consistent.
  • x should not end up in both a and b's children.
  • If x ends up in a, then x.parentId = a.
  • If x ends up in b, then x.parentId = b.
Deleted parent
  • Client A: Adds y as a child of x.
  • Client B: Deletes x.
  • When synced, x must be restored, otherwise y will be orphaned.

4. Efficiency

  • getById
    • Efficiently access a document by id.
  • getByProp
    • Efficiently access a document by an indexed property.
  • delete
    • Efficiently delete a large subtree.
  • import
    • Efficiently import 100k+ nodes.
  • replicate
    • Efficiently replicate 100k+ nodes to a new device.
  • longevity
    • Should not slow down after 1M+ operations

5. Reactive

  • Subscribe to remote changes over websocket

6. Offline storage

  • Web: IndexedDB
  • Mobile: SQLite (capacitor app)

7. Auth

  • Per document permissions

8. Full text search

Sync Engines

Some handy lists of sync engines:

✓ Yes/Maybe

These sync engines are active candidates for use in em. Some of them have not been researched at all.

  • CozoDB
    • Graph db
    • Embeddable
  • Ditto
    • Control replication with queries.
    • No built-in synced event or progress percentage.
  • DXOS
  • Electric-next
  • Fireproof
  • Jazz
  • Liveblocks
  • Partykit
  • PowerSync
    • Partial replication provided at compile-time on the server
    • Partial replication
      • Sync buckets
    • E2E Encryption
      • For end-to-end encryption, the encrypted data can be synced using PowerSync. The data can then either be encrypted and decrypted directly in memory by the application, or a separate local-only table can be used to persist the decrypted data — allowing querying the data directly.
  • RxDB
    • No transactions
    • Conflict resolution through revisions
    • SQLite plugin
  • SyncProxy
  • Tinybase
  • Triplit
  • WatermelonDB

✗ No

These sync engines have been considered and ruled out for reasons described below.

  • BlockSuite
    • Built on YJS
  • ElectricSQL
    • Basically deprecated as they transition to electric-next
    • Sync required before query
      • Once the initial data has synced, queries can run against it.
      • But shapes can be subscribed and unsubscribed on the fly.
    • Dynamic partial replication
      • There is no support for where clauses to filter the initial target rows or select clauses to filter the include tree. As a result, current calls to db.tablename.sync({...}) will "over sync" additional data onto the device.
    • Capacity
      • thousands of rows are definitely viable, and we have that working with our own internal development tests. Hundreds of thousands or millions may cause issues right now, but are something we do want to support.
    • Size
  • Evolu
    • Too alpha
  • Gun.js
    • No full replica
    • Known syncing issues
    • Known codebase quality issues
  • Instant
    • No full replica
    • More offline-cached than offline-first
  • Levelgraph
  • Loro
    • Entire Doc loaded into memory
    • For applications requiring directory-like data manipulation, Loro utilizes the algorithm from A Highly-Available Move Operation for Replicated Trees, simplifying moving and reorganizing hierarchical data structures.
    • Alpha software
    • Similar to YJS
  • Replicache
    • To be replaced by ZeroSync, which is cache-based
    • Downloads everything
  • SurrealDB
  • Verdant
    • No partial replication
    • Too alpha
    • No encryption
  • YJS
    • Entire DB loaded into memory
  • ZeroSync
    • No full replica (cache-based)