-
Notifications
You must be signed in to change notification settings - Fork 138
Rev Tree Storage on Couchbase Server
Revision Trees, or "Rev Trees" for short, are required to support semantics surrounding conflicts and conflict resolution. The best explanation of the problem is in the Conflicts chapter of the CouchDB guide.
-
Rev Trees are stored in the special
_sync
metadata field, in the Couchbase Server document itself (as opposed to in a metadata field) -
Sync Gateway enforces the Rev Tree semantics
-
Internally, Sync Gateway stores the Rev Tree in memory via this structure, however when serialized to JSON it uses a more space-efficient structure.
When the document contains conflicting revisions, and has multiple branches off of a single revision, then the body of the leaf revision of each branch is stored in the document.
There is a slight caveat to that, in that non-leaf node revision bodies are temporarily stored in separate documents, with a default expiration TTL of 5 minutes. This is described below in more detail.
Create a document via Sync Gateway REST API, which results in the following Couchbase Document with a _sync
metadata field:
{
"_sync": {
"rev": "1-51ba9d966e99179007b295b601b0e013",
"sequence": 64756,
"recent_sequences": [
64756
],
"history": {
"revs": [
"1-51ba9d966e99179007b295b601b0e013"
],
"parents": [
-1
],
"channels": [
[
"NBC"
]
]
},
"channels": {
"NBC": null
},
"time_saved": "2015-06-18T19:17:40.469072739Z"
},
"channels": [
"NBC"
],
"type": "test_doc"
}
Update the doc -- add a new revision which adds the doc to another channel.
{
"_sync": {
"rev": "2-e2c395c6006f14e16d0fdd1884c3aedf",
"sequence": 64757,
"recent_sequences": [
64756,
64757
],
"history": {
"revs": [
"1-51ba9d966e99179007b295b601b0e013",
"2-e2c395c6006f14e16d0fdd1884c3aedf"
],
"parents": [
-1,
0
],
"channels": [
[
"NBC"
],
[
"ABC",
"NBC"
]
]
},
"channels": {
"ABC": null,
"NBC": null
},
"time_saved": "2015-06-18T19:18:31.930967967Z"
},
"channels": [
"NBC",
"ABC"
],
"type": "test_doc"
}
At this point, the revision body of revision 1-51ba9d966e99179007b295b601b0e013 is no longer needed by the system for anything, since it is a non-leaf node and the only bodies needed for conflict resolution are those of leaf nodes, as mentioned above. However, it is kept around temporarily (configurable -- 5 minutes by default), in case applications need to retrieve bodies of previous revisions for whatever reason.
The way it is kept around temporarily is by putting the revision body into a separate document on Couchbase Server, with a doc ID of _sync:rev:<doc-id>:<length-of-rev-id>:<prev-rev-id>
and a default expiration TTL of five minutes. In this particular case, the doc ID would be:
_sync:rev:b2193f56d5e7abc232ad9084bdb9b6b0:34:1-51ba9d966e99179007b295b601b0e013
and the document contents would be:
{
"channels": [
"NBC"
],
"type": "test_doc"
}
This can be done via a POST to the _bulk_docs
endpoint, with "new_edits": false
in the JSON body of the request.
Note that:
- There are now two different 2- revisions in the history/revs section of the document (the winner being 2-e2)
- The revision body of the losing branch (2-44) is stored in the history/bodymap section
- The channels that the revision belongs to is tracked in the history/channels array (mapped to revision by matching index of entry in history/revs)
{
"_sync": {
"rev": "2-e2c395c6006f14e16d0fdd1884c3aedf",
"new_rev": "2-44ba9d966e99179007b295b601b0e013",
"flags": 28,
"sequence": 64759,
"recent_sequences": [
64756,
64757,
64759
],
"history": {
"revs": [
"1-51ba9d966e99179007b295b601b0e013",
"2-e2c395c6006f14e16d0fdd1884c3aedf",
"2-44ba9d966e99179007b295b601b0e013"
],
"parents": [
-1,
0,
0
],
"bodymap": {
"2": "{\"channels\":[\"CBS\"],\"type\":\"test_doc_updated\"}"
},
"channels": [
[
"NBC"
],
[
"ABC",
"NBC"
],
[
"CBS"
]
]
},
"channels": {
"ABC": null,
"NBC": null
},
"time_saved": "2015-06-18T20:51:42.248325853Z"
},
"channels": [
"NBC",
"ABC"
],
"type": "test_doc"
}
And, adding another branch to the rev tree, the document becomes as is shown below.
Note that:
- Since there are now two non-winner branches, there are two corresponding leaf-node revision bodies stored under history/bodymap.
{
"_sync": {
"rev": "2-e2c395c6006f14e16d0fdd1884c3aedf",
"new_rev": "2-33ba9d966e99179007b295b601b0e013",
"flags": 28,
"sequence": 64760,
"recent_sequences": [
64756,
64757,
64759,
64760
],
"history": {
"revs": [
"2-e2c395c6006f14e16d0fdd1884c3aedf",
"2-44ba9d966e99179007b295b601b0e013",
"2-33ba9d966e99179007b295b601b0e013",
"1-51ba9d966e99179007b295b601b0e013"
],
"parents": [
3,
3,
3,
-1
],
"bodymap": {
"1": "{\"channels\":[\"CBS\"],\"type\":\"test_doc_updated\"}",
"2": "{\"channels\":[\"PBS\"],\"type\":\"test_doc_updated_second_conflict\"}"
},
"channels": [
[
"ABC",
"NBC"
],
[
"CBS"
],
[
"PBS"
],
[
"NBC"
]
]
},
"channels": {
"ABC": null,
"NBC": null
},
"time_saved": "2015-06-18T20:54:01.006421523Z"
},
"channels": [
"NBC",
"ABC"
],
"type": "test_doc"
}
Suppose that you have a revision history with a single branch, which has 1000 revisions. Pruning is applied, with a MaxRevTreeDepth of 100, leaving only the last 100 revisions in the branch (900-1000). Then, there is an attempt to create a new revision off of revision 1, which has been pruned away. (note: this can be done via the _bulk_docs endpoint, with new_edits set to false)
What happens?
It creats a new "root revision branch" if it couldn't find a match for the anything in the incoming revision history in the existing history. (TODO: check if we have any unit tests to prove this)
-
Due to conflicts that limit the prunability (see Scenario 2 above). This results in storing excessive ancestor revision metadata (but the number of stored revision bodies are not affected by this)
-
Excessive number of conflicts (wide trees), will result in many revision bodies being stored, because:
-
As pruning only prunes revision metadata, and not bodies.
-
Leaf revision bodies for all branches must be kept indefinitely, and cannot be compacted away.
-