Skip to content

Commit

Permalink
Sync from etcd-io/website@f84d66c by PCIT
Browse files Browse the repository at this point in the history
  • Loading branch information
khs1994 committed Mar 28, 2024
1 parent c59406b commit 555ce65
Show file tree
Hide file tree
Showing 4 changed files with 95 additions and 62 deletions.
2 changes: 1 addition & 1 deletion SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@
* [Golang Modules](dev-internal/modules.md)
- Learning
* [Learning](learning/_index.md)
* [KV API Guarantees](learning/api_guarantees.md)
* [Etcd API Guarantees](learning/api_guarantees.md)
* [Etcd API](learning/api.md)
* [Data Model](learning/data_model.md)
* [Etcd V 3 Authentication Design](learning/design-auth-v3.md)
Expand Down
6 changes: 2 additions & 4 deletions learning/api.md
Original file line number Diff line number Diff line change
Expand Up @@ -359,10 +359,7 @@ message Event {

Watches are long-running requests and use gRPC streams to stream event data. A watch stream is bi-directional; the client writes to the stream to establish watches and reads to receive watch events. A single watch stream can multiplex many distinct watches by tagging events with per-watch identifiers. This multiplexing helps reducing the memory footprint and connection overhead on the core etcd cluster.

Watches make three guarantees about events:
* Ordered - events are ordered by revision; an event will never appear on a watch if it precedes an event in time that has already been posted.
* Reliable - a sequence of events will never drop any subsequence of events; if there are events ordered in time as a < b < c, then if the watch receives events a and c, it is guaranteed to receive b.
* Atomic - a list of events is guaranteed to encompass complete revisions; updates in the same revision over multiple keys will not be split over several lists of events.
To read about guarantees made about watch events, please read [etcd api guarantees][watch-api-guarantees].

A client creates a watch by sending a `WatchCreateRequest` over a stream returned by `Watch`:

Expand Down Expand Up @@ -484,6 +481,7 @@ message LeaseKeepAliveResponse {
* ID - the lease that was refreshed with a new TTL.
* TTL - the new time-to-live, in seconds, that the lease has remaining.

+[watch-api-guarantees]: ../api_guarantees/#watch-apis
[elections]: https://github.com/etcd-io/etcd/blob/master/client/v3/concurrency/election.go
[grpc-api]: ../../dev-guide/api_reference_v3/
[grpc-service]: https://github.com/etcd-io/etcd/blob/master/api/etcdserverpb/rpc.proto
Expand Down
147 changes: 91 additions & 56 deletions learning/api_guarantees.md
Original file line number Diff line number Diff line change
@@ -1,75 +1,65 @@
---
title: KV API guarantees
title: etcd API guarantees
weight: 2750
description: KV API guarantees made by etcd
description: API guarantees made by etcd
---

etcd is a consistent and durable key value store with [mini-transaction][txn]
support. The key value store is exposed through the KV APIs. etcd tries to
ensure the strongest consistency and durability guarantees for a distributed
system. This specification enumerates the KV API guarantees made by etcd.
etcd is a consistent and durable key value store.
The key value store is exposed through [gRPC Services].
etcd ensures the strongest consistency and durability guarantees for a distributed system.
This specification enumerates the API guarantees made by etcd.

### APIs to consider

* Read APIs
* range
* watch
* Write APIs
* put
* delete
* Combination (read-modify-write) APIs
* txn
* KV APIs
* [Range](../api/#range)
* [Put](../api/#put)
* [Delete](../api/#delete-range)
* [Transaction](../api/#transaction)
* Watch APIs
* [Watch](../api/#watch-api)
* Lease APIs
* grant
* revoke
* put (attaching a lease to a key)
* [Grant](../api/#obtaining-leases)
* [Revoke]
* [Keep alive](../api/#keep-alives)

### etcd specific definitions
KV API allows for direct reading and manipulation of key value store.
Watch API allows subscribing to key value store changes.
Lease API allows assigning a time to live to a key.

#### Operation completed
Both KV and Watch APIs allow access to not only the latest versions of keys, but
also previous versions are accessible within a continuous history window, limited
by a compaction operation.

An etcd operation is considered complete when it is committed through consensus,
and therefore “executed” -- permanently stored -- by the etcd storage engine.
The client knows an operation is completed when it receives a response from the
etcd server. Note that the client may be uncertain about the status of an
operation if it times out, or there is a network disruption between the client
and the etcd member. etcd may also abort operations when there is a leader
election. etcd does not send `abort` responses to clients’ outstanding requests
in this event.
Calling KV API will take an immediate effect, while Watch API will return with some unbounded delay.
In correctly working etcd cluster you should expect to see watch events to appear with 10ms delay after them happening.
However, there is no limit and events in unhealthy clusters might never arrive.

#### Revision
## KV APIs

An etcd operation that modifies the key value store is assigned a single
increasing revision. A transaction operation might modify the key value store
multiple times, but only one revision is assigned. The revision attribute of a
key value pair that was modified by the operation has the same value as the
revision of the operation. The revision can be used as a logical clock for key
value store. A key value pair that has a larger revision is modified after a key
value pair with a smaller revision. Two key value pairs that have the same
revision are modified by an operation "concurrently".
etcd ensures durability and strict serializability for all KV api calls.
Those are the strongest isolation guarantee of distributed transactional database systems.

### Guarantees provided
### Durability

#### Atomicity
Any completed operations are durable. All accessible data is also durable data.
A read will never return data that has not been made durable.

All API requests are atomic; an operation either completes entirely or not at
all. For watch requests, all events generated by one operation will be in one
watch response. Watch never observes partial events for a single operation.
### Strict serializability

#### Durability
KV Service operations are atomic and occur in a total order, consistent with
real-time order of those operations. Total order is implied through [revision].
Read more about [strict serializability].

Any completed operations are durable. All accessible data is also durable data.
A read will never return data that has not been made durable.
Strict serializability implies other weaker guarantees that might be easier to understand:

#### Isolation level and consistency of replicas
#### Atomicity

etcd ensures [strict serializability][strict_serializability], which is the
strongest isolation guarantee of distributed transactional database systems.
Read operations will never observe any intermediate data.
All API requests are atomic; an operation either completes entirely or not at
all. For watch requests, all events generated by one operation will be in one
watch response. Watch never observes partial events for a single operation.

etcd ensures [linearizability][linearizability] as consistency of replicas
basically. As described below, exceptions are watch operations and read
operations which explicitly specifies serializable option.
#### Linearizability

From the perspective of client, linearizability provides useful properties which
make reasoning easily. This is a clean description quoted from
Expand All @@ -85,18 +75,37 @@ most current value. Without linearizability guarantee, the returned value,
current at *t2* when the read began, might be "stale" by *t3* because a
concurrent write might happen between *t2* and *t3*.

etcd does not ensure linearizability for watch operations. Users are expected
to verify the revision of watch responses to ensure correct ordering.

etcd ensures linearizability for all other operations by default.
Linearizability comes with a cost, however, because linearized requests must go
through the Raft consensus process. To obtain lower latencies and higher
throughput for read requests, clients can configure a request’s consistency
mode to `serializable`, which may access stale data with respect to quorum, but
removes the performance penalty of linearized accesses' reliance on live consensus.

## Watch APIs

Watches make guarantees about events:
* Ordered - events are ordered by revision.
An event will never appear on a watch if it precedes an event in time that
has already been posted.
* Unique - an event will never appear on a watch twice.
* Reliable - a sequence of events will never drop any subsequence of events
within the available history window. If there are events ordered in time as
a < b < c, then if the watch receives events a and c, it is guaranteed to
receive b as long b is in the available history window.
* Atomic - a list of events is guaranteed to encompass complete revisions.
Updates in the same revision over multiple keys will not be split over several
lists of events.
* Resumable - A broken watch can be resumed by establishing a new watch starting
after the last revision received in a watch event before the break, so long as
the revision is in the history window.
* Bookmarkable - Progress notification events guarantee that all events up to a
revision have been already delivered.

### Granting, attaching and revoking leases
etcd does not ensure linearizability for watch operations. Users are expected
to verify the revision of watch events to ensure correct ordering with other operations.

## Lease APIs

etcd provides [a lease mechanism][lease]. The primary use case of a lease is
implementing distributed coordination mechanisms like distributed locks. The
Expand All @@ -106,9 +115,35 @@ expired by the wall clock time to live (TTL). However, users need to be aware
about [the important properties of the APIs and usage][why] for implementing
correct distributed coordination mechanisms.

## etcd specific definitions

### Operation completed

An etcd operation is considered complete when it is committed through consensus,
and therefore “executed” -- permanently stored -- by the etcd storage engine.
The client knows an operation is completed when it receives a response from the
etcd server. Note that the client may be uncertain about the status of an
operation if it times out, or there is a network disruption between the client
and the etcd member. etcd may also abort operations when there is a leader
election. etcd does not send `abort` responses to clients’ outstanding requests
in this event.

### Revision

An etcd operation that modifies the key value store is assigned a single
increasing revision. A transaction operation might modify the key value store
multiple times, but only one revision is assigned. The revision attribute of a
key value pair that was modified by the operation has the same value as the
revision of the operation. The revision can be used as a logical clock for key
value store. A key value pair that has a larger revision is modified after a key
value pair with a smaller revision. Two key value pairs that have the same
revision are modified by an operation "concurrently".

[grpc Services]: ../api/#grpc-services
[lease]: https://web.stanford.edu/class/cs240/readings/89-leases.pdf
[linearizability]: https://cs.brown.edu/~mph/HerlihyW90/p463-herlihy.pdf
[serializable_isolation]: https://en.wikipedia.org/wiki/Isolation_(database_systems)#Serializable
[strict_serializability]: http://jepsen.io/consistency/models/strict-serializable
[strict serializability]: http://jepsen.io/consistency/models/strict-serializable
[txn]: ../api/#transaction
[why]: ../why/#notes-on-the-usage-of-lock-and-lease
[revision]: #revision
2 changes: 1 addition & 1 deletion learning/why.md
Original file line number Diff line number Diff line change
Expand Up @@ -115,7 +115,7 @@ Note that in the case of etcd keys, it can be locked efficiently because of the
[etcd-etcdctl-elect]: https://github.com/etcd-io/etcd/blob/master/etcdctl/README.md#elect-options-election-name-proposal
[etcd-etcdctl-lock]: https://github.com/etcd-io/etcd/blob/master/etcdctl/README.md#lock-options-lockname-command-arg1-arg2-
[etcd-json]: ../../dev-guide/api_grpc_gateway/
[etcd-linread]: ../api_guarantees/#isolation-level-and-consistency-of-replicas
[etcd-linread]: ../api_guarantees/#linearizability
[etcd-mvcc]: ../data_model/
[etcd-rbac]: ../../op-guide/authentication/rbac
[etcd-recipe]: https://godoc.org/github.com/etcd-io/etcd/client/v3/experimental/recipes
Expand Down

0 comments on commit 555ce65

Please sign in to comment.