Skip to content
Jens Alfke edited this page Apr 24, 2013 · 2 revisions

Couchbase Sync Gateway Document Schema

Data Documents

Data documents are the documents stored via the CouchDB API. They're stored directly as Couchbase documents. The document IDs are unaltered; we just restrict them from starting with an underscore, so that we can reserve such IDs for other types of internal documents.

Internally a data document has an extra top-level property named _sync added. This is not visible to clients, just used internally. Its structure is defined by the type syncData declared in db/document.go. In JSON form in the database, it looks like this:

"_sync": {
    "rev": "1-cafebabe...",
    "deleted": false,
    "sequence": 1234,
    "history": { ... revision tree ... },
    "channels": { ... channel map ... },
    "access": { ... access map ... }
}
  • rev is the current revision ID. (If there are conflicts, this is the default or "winning" ID.)
  • deleted is true if the current revision is deleted.
  • sequence is the current revision's sequence number in the database.
  • history is the revision tree, which contains information about all revisions of the document.
  • channels is the channel map, which tracks which channels the document belongs to and used to belong to.
  • access is the access map, which tracks which users are given access to which channels by this document.

Revision Trees

The revision tree is defined in db/revtree.go. Its JSON serialization is different than the way it's stored in memory, for space efficiency. It's stored as a set of arrays:

{
    "revs": ["1-cafebabe", "2-f00ba555", "3-deadbeef"],
    "parents": [-1, 0, 1],
    "bodies": ["{....}", "{....}", "{}"],
    "deleted": [2],
}

The revs, parents and bodies arrays are all parallel: the same index in each one describes the same revision. revs[i] is the revision ID of the i'th revision; parents[i] is the index of its parent; bodies[i] is its JSON body (as a string) or null if the body isn't available.

The deleted array is different: it's a list of indexes into the above arrays, indicating which of those revisions are deleted. (In the above example, the revision 3-deadbeef is deleted but not the other two.)

Channel Maps

A channel map (defined as ChannelMap in db/ChannelMap.go) is mostly a set of channel IDs. But it's also used to track which channels a document used to belong to, and when it stopped belonging. For example:

{
    "CBS": null,
    "ABC": null,
    "HBO": {"rev": "2-f00ba555", "seq": 1099}
}

The keys are channel IDs. The value for a channel is null if the document currently belongs to the channel; otherwise it's an object that contains the ID and sequence number of the revision that removed the document from this channel.

Access Maps

The access map (defined as AccessMap in channels/channelmapper.go) is conceptually a set of (user, channel) pairs, whose meaning is that the document gives this user access to this channel. It's stored in JSON as:

{
    "alice": ["ABC", "NBC"],
    "bob": ["Cinemax"]
}

The keys are user IDs and the values are arrays of channel IDs.

Attachment Documents

CouchDB-style attachments are stored as individual documents in the database. The document ID of an attachment is of the form _sync:att:sha1- followed by the base64 encoding of the SHA-1 digest of the attachment's data. (See db/attachment.go.) The body of the document is nothing more than the raw binary data of the attachment.

(Note that this is a content-addressable store. A single attachment document may be associated with multiple documents as long as its data is the same, even if the metadata given it in those documents differs.)

A data documents refers to its attachments via the standard digest properties in its _attachments CouchDB metadata field.

User/Role Documents

Users and roles are stored in documents with ID prefixes user: and role: respectively. Their common data in JSON form looks like:

{
    "name": "peon",
    "admin_channels": ["CBS"],
    "all_channels": ["CBS", "NBC", "ABC"]
}
  • admin_channels is a set of channels the user/role is explicitly added to by an administrator. (This property can be set through the admin REST API.)
  • all_channels is a derived property that's generated from the union of the admin_channels as well as all the references to this user/role in every document's access map. Its value may be set to null, which indicates that the old value is out of date and will need to be recomputed and stored back into the document.

A user has additional fields:

    "email": "[email protected]",
    "disabled": false,
    "passwordhash": {...},
    "roles": ["peon", "admin"]
  • email is of course the user's email address.
  • disabled can be set to true to prevent a user from accessing the database.
  • passwordhash is an opaque structure that contains the hashed salted password.
  • roles is a list of roles that the user belongs to.

Local Documents

CouchDB has a concept of "local" documents -- any document with an ID starting with _local/ is stored separately (outside the normal b-tree), has no revision history, is not indexed by any views, and will never be replicated. Local documents are a somewhat obscure feature, but the CouchDB replication protocol uses them to store checkpoints, so it's necessary to support them.

A local document is stored in a document whose name is _sync:local: followed by the document ID (without the _local/ prefix.) It contains the document's JSON, plus a special _rev property that stores the current revision ID.

Local documents are implemented in db/special_docs.go.

Clone this wiki locally