Skip to content

Latest commit

 

History

History
106 lines (84 loc) · 3.99 KB

README.md

File metadata and controls

106 lines (84 loc) · 3.99 KB

RecordLite

RecordLite is a library (and executable) that declaratively maintains SQLite tables and views of semi-structured data (henceforth known as records). RecordLite is based on a hidden gem in Backtrace's sqlite_protobuf library.

TLDR: RecordLite stores semi-structured records into SQLite table where one column is the raw payload (JSON or Protobuf) and define views with virtual columns from the raw column via extraction functions, e.g. json_extract and protobuf_extract. Tell RecordLite what you want the view and its indexes to look like, and RecordLite spits out idempotent DDL statements to make that happen, for any initial state and with minimal churn.

Installation

To install the recordlite executable via go get.

# go install github.com/fsaintjacques/recordlite/cmd/recordlite@latest

To install recordlite as a go module dependency

# go get github.com/fsaintjacques/recordlite/cmd/recordlite@latest

About

Why

Schema management is a hard practical problem. By using semi-structured data and storing it as a single BLOB column, one does not need to modify the "true" table's schema, only the dynamic views.

How

For a given table of records, RecordLite generates a companion view that exposes virtual columns of fields of interested defined by the user. The columns can be optionally indexed if they're often projected and/or used for filtering. The column are defined with a function extracting the data from the raw column (either JSON or Protobuf), note that this would also work for any type of expression.

Since the view does not own the data, it is safe to delete/add/update columns without affecting the underlying table. In other words, it is relatively cheap to modify the view since it will not trigger a massive scan + write loop. OTOH, updating any index will require recomputing the index.

Examples

$ # Define a schema
$ cat schema.json
{
  "name": "records",
  "columns": [
    {"name": "status", "expr": "json_extract(raw, '$.status')", "with_index": true},
    {"name": "color", "expr": "json_extract(raw, '$.attrs.color')", "with_index": true}
  ]
}

$ # Create the table and views definitions
$ recordlite schema.json | sqlite3 records.db

$ # Simulate a process appending to the records
$ cat << EOF | awk '{print "INSERT INTO records(raw) VALUES('"'"'" $0 "'"'"');"}' | sqlite3 records.db
> {"status":"ok", "attrs": {"color": "red", "size": "big"}}
> {"status":"failed", "attrs": {"color":"blue", "size": "small"}}
> EOF

$ sqlite3 --box records.db
sqlite> SELECT id, status, color from records;
┌────┬────────┬───────┐
│ id │ status │ color │
├────┼────────┼───────┤
│ 1  │ ok     │ red   │
│ 2  │ failed │ blue  │
└────┴────────┴───────┘

$ # Let's modify the schema to index attrs.size
$ cat schema.json
{
  "name": "records",
  "columns": [
    {"name": "status", "expr": "json_extract(raw, '$.status')", "with_index": true},
    {"name": "color", "expr": "json_extract(raw, '$.attrs.color')", "with_index": true},
    {"name": "size", "expr": "json_extract(raw, '$.attrs.size')", "with_index": true}
  ]
}

$ # Update the views definition
$ recordlite schema.json | sqlite3 records.db

$ sqlite3 --box records.db
sqlite> SELECT id, status, color, size FROM records;
┌────┬────────┬───────┬───────┐
│ id │ status │ color │ size  │
├────┼────────┼───────┼───────┤
│ 1  │ ok     │ red   │ big   │
│ 2  │ failed │ blue  │ small │
└────┴────────┴───────┴───────┘