From 555ce6595e8f24487c7ba8083f6db3a459b1891b Mon Sep 17 00:00:00 2001 From: khs1994 Date: Thu, 28 Mar 2024 08:48:07 +0800 Subject: [PATCH] Sync from etcd-io/website@f84d66c by PCIT --- SUMMARY.md | 2 +- learning/api.md | 6 +- learning/api_guarantees.md | 147 +++++++++++++++++++++++-------------- learning/why.md | 2 +- 4 files changed, 95 insertions(+), 62 deletions(-) diff --git a/SUMMARY.md b/SUMMARY.md index 3d18ad7..42e9eb7 100644 --- a/SUMMARY.md +++ b/SUMMARY.md @@ -26,7 +26,7 @@ * [Golang Modules](dev-internal/modules.md) - Learning * [Learning](learning/_index.md) - * [KV API Guarantees](learning/api_guarantees.md) + * [Etcd API Guarantees](learning/api_guarantees.md) * [Etcd API](learning/api.md) * [Data Model](learning/data_model.md) * [Etcd V 3 Authentication Design](learning/design-auth-v3.md) diff --git a/learning/api.md b/learning/api.md index eb9bf9f..743066a 100644 --- a/learning/api.md +++ b/learning/api.md @@ -359,10 +359,7 @@ message Event { Watches are long-running requests and use gRPC streams to stream event data. A watch stream is bi-directional; the client writes to the stream to establish watches and reads to receive watch events. A single watch stream can multiplex many distinct watches by tagging events with per-watch identifiers. This multiplexing helps reducing the memory footprint and connection overhead on the core etcd cluster. -Watches make three guarantees about events: -* Ordered - events are ordered by revision; an event will never appear on a watch if it precedes an event in time that has already been posted. -* Reliable - a sequence of events will never drop any subsequence of events; if there are events ordered in time as a < b < c, then if the watch receives events a and c, it is guaranteed to receive b. -* Atomic - a list of events is guaranteed to encompass complete revisions; updates in the same revision over multiple keys will not be split over several lists of events. +To read about guarantees made about watch events, please read [etcd api guarantees][watch-api-guarantees]. A client creates a watch by sending a `WatchCreateRequest` over a stream returned by `Watch`: @@ -484,6 +481,7 @@ message LeaseKeepAliveResponse { * ID - the lease that was refreshed with a new TTL. * TTL - the new time-to-live, in seconds, that the lease has remaining. ++[watch-api-guarantees]: ../api_guarantees/#watch-apis [elections]: https://github.com/etcd-io/etcd/blob/master/client/v3/concurrency/election.go [grpc-api]: ../../dev-guide/api_reference_v3/ [grpc-service]: https://github.com/etcd-io/etcd/blob/master/api/etcdserverpb/rpc.proto diff --git a/learning/api_guarantees.md b/learning/api_guarantees.md index 30c5ab6..545d304 100644 --- a/learning/api_guarantees.md +++ b/learning/api_guarantees.md @@ -1,75 +1,65 @@ --- -title: KV API guarantees +title: etcd API guarantees weight: 2750 -description: KV API guarantees made by etcd +description: API guarantees made by etcd --- -etcd is a consistent and durable key value store with [mini-transaction][txn] -support. The key value store is exposed through the KV APIs. etcd tries to -ensure the strongest consistency and durability guarantees for a distributed -system. This specification enumerates the KV API guarantees made by etcd. +etcd is a consistent and durable key value store. +The key value store is exposed through [gRPC Services]. +etcd ensures the strongest consistency and durability guarantees for a distributed system. +This specification enumerates the API guarantees made by etcd. ### APIs to consider -* Read APIs - * range - * watch -* Write APIs - * put - * delete -* Combination (read-modify-write) APIs - * txn +* KV APIs + * [Range](../api/#range) + * [Put](../api/#put) + * [Delete](../api/#delete-range) + * [Transaction](../api/#transaction) +* Watch APIs + * [Watch](../api/#watch-api) * Lease APIs - * grant - * revoke - * put (attaching a lease to a key) + * [Grant](../api/#obtaining-leases) + * [Revoke] + * [Keep alive](../api/#keep-alives) -### etcd specific definitions +KV API allows for direct reading and manipulation of key value store. +Watch API allows subscribing to key value store changes. +Lease API allows assigning a time to live to a key. -#### Operation completed +Both KV and Watch APIs allow access to not only the latest versions of keys, but +also previous versions are accessible within a continuous history window, limited +by a compaction operation. -An etcd operation is considered complete when it is committed through consensus, -and therefore “executed” -- permanently stored -- by the etcd storage engine. -The client knows an operation is completed when it receives a response from the -etcd server. Note that the client may be uncertain about the status of an -operation if it times out, or there is a network disruption between the client -and the etcd member. etcd may also abort operations when there is a leader -election. etcd does not send `abort` responses to clients’ outstanding requests -in this event. +Calling KV API will take an immediate effect, while Watch API will return with some unbounded delay. +In correctly working etcd cluster you should expect to see watch events to appear with 10ms delay after them happening. +However, there is no limit and events in unhealthy clusters might never arrive. -#### Revision +## KV APIs -An etcd operation that modifies the key value store is assigned a single -increasing revision. A transaction operation might modify the key value store -multiple times, but only one revision is assigned. The revision attribute of a -key value pair that was modified by the operation has the same value as the -revision of the operation. The revision can be used as a logical clock for key -value store. A key value pair that has a larger revision is modified after a key -value pair with a smaller revision. Two key value pairs that have the same -revision are modified by an operation "concurrently". +etcd ensures durability and strict serializability for all KV api calls. +Those are the strongest isolation guarantee of distributed transactional database systems. -### Guarantees provided +### Durability -#### Atomicity +Any completed operations are durable. All accessible data is also durable data. +A read will never return data that has not been made durable. -All API requests are atomic; an operation either completes entirely or not at -all. For watch requests, all events generated by one operation will be in one -watch response. Watch never observes partial events for a single operation. +### Strict serializability -#### Durability +KV Service operations are atomic and occur in a total order, consistent with +real-time order of those operations. Total order is implied through [revision]. +Read more about [strict serializability]. -Any completed operations are durable. All accessible data is also durable data. -A read will never return data that has not been made durable. +Strict serializability implies other weaker guarantees that might be easier to understand: -#### Isolation level and consistency of replicas +#### Atomicity -etcd ensures [strict serializability][strict_serializability], which is the -strongest isolation guarantee of distributed transactional database systems. -Read operations will never observe any intermediate data. +All API requests are atomic; an operation either completes entirely or not at +all. For watch requests, all events generated by one operation will be in one +watch response. Watch never observes partial events for a single operation. -etcd ensures [linearizability][linearizability] as consistency of replicas -basically. As described below, exceptions are watch operations and read -operations which explicitly specifies serializable option. +#### Linearizability From the perspective of client, linearizability provides useful properties which make reasoning easily. This is a clean description quoted from @@ -85,9 +75,6 @@ most current value. Without linearizability guarantee, the returned value, current at *t2* when the read began, might be "stale" by *t3* because a concurrent write might happen between *t2* and *t3*. -etcd does not ensure linearizability for watch operations. Users are expected -to verify the revision of watch responses to ensure correct ordering. - etcd ensures linearizability for all other operations by default. Linearizability comes with a cost, however, because linearized requests must go through the Raft consensus process. To obtain lower latencies and higher @@ -95,8 +82,30 @@ throughput for read requests, clients can configure a request’s consistency mode to `serializable`, which may access stale data with respect to quorum, but removes the performance penalty of linearized accesses' reliance on live consensus. +## Watch APIs + +Watches make guarantees about events: +* Ordered - events are ordered by revision. + An event will never appear on a watch if it precedes an event in time that + has already been posted. +* Unique - an event will never appear on a watch twice. +* Reliable - a sequence of events will never drop any subsequence of events + within the available history window. If there are events ordered in time as + a < b < c, then if the watch receives events a and c, it is guaranteed to + receive b as long b is in the available history window. +* Atomic - a list of events is guaranteed to encompass complete revisions. + Updates in the same revision over multiple keys will not be split over several + lists of events. +* Resumable - A broken watch can be resumed by establishing a new watch starting + after the last revision received in a watch event before the break, so long as + the revision is in the history window. +* Bookmarkable - Progress notification events guarantee that all events up to a + revision have been already delivered. -### Granting, attaching and revoking leases +etcd does not ensure linearizability for watch operations. Users are expected +to verify the revision of watch events to ensure correct ordering with other operations. + +## Lease APIs etcd provides [a lease mechanism][lease]. The primary use case of a lease is implementing distributed coordination mechanisms like distributed locks. The @@ -106,9 +115,35 @@ expired by the wall clock time to live (TTL). However, users need to be aware about [the important properties of the APIs and usage][why] for implementing correct distributed coordination mechanisms. +## etcd specific definitions + +### Operation completed + +An etcd operation is considered complete when it is committed through consensus, +and therefore “executed” -- permanently stored -- by the etcd storage engine. +The client knows an operation is completed when it receives a response from the +etcd server. Note that the client may be uncertain about the status of an +operation if it times out, or there is a network disruption between the client +and the etcd member. etcd may also abort operations when there is a leader +election. etcd does not send `abort` responses to clients’ outstanding requests +in this event. + +### Revision + +An etcd operation that modifies the key value store is assigned a single +increasing revision. A transaction operation might modify the key value store +multiple times, but only one revision is assigned. The revision attribute of a +key value pair that was modified by the operation has the same value as the +revision of the operation. The revision can be used as a logical clock for key +value store. A key value pair that has a larger revision is modified after a key +value pair with a smaller revision. Two key value pairs that have the same +revision are modified by an operation "concurrently". + +[grpc Services]: ../api/#grpc-services [lease]: https://web.stanford.edu/class/cs240/readings/89-leases.pdf [linearizability]: https://cs.brown.edu/~mph/HerlihyW90/p463-herlihy.pdf [serializable_isolation]: https://en.wikipedia.org/wiki/Isolation_(database_systems)#Serializable -[strict_serializability]: http://jepsen.io/consistency/models/strict-serializable +[strict serializability]: http://jepsen.io/consistency/models/strict-serializable [txn]: ../api/#transaction [why]: ../why/#notes-on-the-usage-of-lock-and-lease +[revision]: #revision diff --git a/learning/why.md b/learning/why.md index 0e932c8..91aec04 100644 --- a/learning/why.md +++ b/learning/why.md @@ -115,7 +115,7 @@ Note that in the case of etcd keys, it can be locked efficiently because of the [etcd-etcdctl-elect]: https://github.com/etcd-io/etcd/blob/master/etcdctl/README.md#elect-options-election-name-proposal [etcd-etcdctl-lock]: https://github.com/etcd-io/etcd/blob/master/etcdctl/README.md#lock-options-lockname-command-arg1-arg2- [etcd-json]: ../../dev-guide/api_grpc_gateway/ -[etcd-linread]: ../api_guarantees/#isolation-level-and-consistency-of-replicas +[etcd-linread]: ../api_guarantees/#linearizability [etcd-mvcc]: ../data_model/ [etcd-rbac]: ../../op-guide/authentication/rbac [etcd-recipe]: https://godoc.org/github.com/etcd-io/etcd/client/v3/experimental/recipes