-
Notifications
You must be signed in to change notification settings - Fork 138
Channel Logs
THIS DOCUMENT IS OBSOLETE -- THIS ARCHITECTURE WAS REPLACED IN SPRING 2014 BY A NEWER ONE (IN-MEMORY CHANNEL CACHES.)
A channel-log is a (Couchbase) document that stores the recent history of a channel. It contains, conceptually, the recent _changes
feeed entries for that channel: a list of {docid, revid, sequence} tuples. Its timeline needs to extend at least back to the sequence "checkpoint" (the point before which all sequences have been persisted to disk and show up in views), and it will probably go back farther as an optimization.
A changes feed for a channel is generated primarily from the channel-log document. If older entries are needed -- especially in the case where a new client needs to start at sequence 1 -- the gateway runs a view query on the "changes" view as it does today, and merges its output with the channel-log (because the channel-log may contain revisions that haven't been persisted yet.)
The channel-log document is updated by the same gateway process that added the associated revision, and it is authoritative for the recent past (changes more recent than the checkpoint sequence). However the view query is authoritative for any changes older than the checkpoint sequence.
The current definition of a channel-log in Go (from channels/change_log.go) is:
type LogEntry struct {
Sequence uint64 `json:"seq"` // Sequence number
DocID string `json:"doc,omitempty"` // Empty if this entry has been replaced
RevID string `json:"rev,omitempty"` // Empty if this entry has been replaced
Deleted bool `json:"del,omitempty"` // True for a deletion tombstone revision
Removed bool `json:"rmv,omitempty"` // True for a channel-removal tombstone revision
Hidden bool `json:"hid,omitempty"` // True for a losing rev of a conflict
}
type ChangeLog struct {
Since uint64 // Sequence this is valid after
Entries []LogEntry // Entries in order they were added (not sequence order!)
}
When a gateway node updates a document, it iterates over all the channels that document is in, and appends a new entry to each channel-log. (Actually it's not purely an append, since old entries may have to be be expired from the beginning. Updates of course need to use CAS.)
The rows in the channel-log are ordered by the order they were appended by sync gateway processes, not by sequence number (it could be possible that these orders will not coincide, or in the case of chunky sequences ("epochs") that sequence numbers won't be unique.) This is important because we might serve sequence 46 to a client, and then have sequence 45 appended to the channel-log. So we need to render the provisional sequences in append order, not sequence order.
In the case where a document has multiple changes in quick succession, we can't remove the parent revisions' log entries, because we may need to find their sequence numbers later. But we can remove their document and revision IDs, which take up most of the room.
As a log grows, the oldest (first) entries will be removed to limit its length. This causes its Since
property to change: it's updated to the sequence number of the last entry removed. That way the changes feed can tell whether the log is authoritative for the desired sequence range, or whether it needs to backfill it by also querying the changes view.
The sequences appearing in the public API of the _changes
feed are no longer the same as the internal sequence numbers. Instead, the sequence IDs are strings encoding a mapping from channel name to sequence number (the Go type is channels.TimedSet
.) An example sequence ID string:
abc:16,cnn:8,pbs:20
This encodes that the latest change seen on the abc
channel is sequence #16, the latest on the cnn
channel is #8, and the latest on pbs
is #20 (with no changes sent on any other channels.)
These sequence IDs are sent by the client to the changes feed (in the ?since=
parameter) and returned to the client in the last_seq
JSON property. As always, they are opaque to the client.
The input since
change-ID is parsed into a TimedSet
of channel-to-sequence-number mappings. For each channel requested, its last-seen sequence number is looked up from this map, and the feed for that channel is generated (in parallel, using a goroutine.)
To generate a single channel's feed, we fetch the channel-log document and look up the since
sequence in it. If found, all the following entries go into the feed.
Each channel log has its own Since
property that specifies the sequence after which it's authoritative. As a special case, if the sequence we're looking for is not found in the log but it's greater than the log's Since
property, we treat that sequence as being before the first entry in the log, i.e. we return the entire log. This prevents unnecessary view queries (q.v.) for sparse channels: if we know a channel has no revisions with sequences before 100, then we shouldn't query a view just because we got asked to start at sequence 1.
Otherwise, if if the since
value is not greater than the log's Since
property (or the log is missing), we need to query the changes view starting from that sequence. We write results from that view (in increasing sequence number) to the feed until they pass the Since
value of the log; after that we switch over to the log, since it might have newer revisions that aren't in the view yet.
If the log was missing, we use the latest entries from the view to create a new one and save it.
The feeds from the individual channels are then merged together into one by repeatedly taking the lowest-sequence-numbered available revision (merging the channel info if the same revision appears on multiple channels) and writing it to the output.
The longpoll
and continuous
modes of the _changes
feed may require that the handler wait for changes to happen.
Gateway nodes use the TAP feed to observe changes in channel logs. When a handler is waiting for new changes, and detects that one of the relevant channels' log document is updated, it generates the feed again and reads and sends the new entries.