Skip to content

Commit

Permalink
M "src/list/raxos/Abstract Paxos Diagram.canvas"
Browse files Browse the repository at this point in the history
  • Loading branch information
drmingdrmer committed Dec 25, 2024
1 parent 11347dd commit 3ed1102
Show file tree
Hide file tree
Showing 29 changed files with 78 additions and 92 deletions.
51 changes: 31 additions & 20 deletions src/list/raxos/Abstract Paxos Diagram.canvas
Original file line number Diff line number Diff line change
Expand Up @@ -8,14 +8,9 @@
{"id":"707de0dee3c23555","type":"file","file":"consensus-essence/src/list/raxos/example-T-in-single-threaded.md","x":1580,"y":820,"width":400,"height":400,"color":"5"},
{"id":"b553044d1b09bc14","type":"file","file":"consensus-essence/src/list/raxos/def-API-read-write.md","x":1184,"y":1432,"width":400,"height":400},
{"id":"11003bbfc4ff8328","type":"file","file":"consensus-essence/src/list/raxos/def-T-History-PartialOrd.md","x":420,"y":1320,"width":465,"height":238},
{"id":"9c32627fede27385","type":"file","file":"consensus-essence/src/list/raxos/example-RW-single-threaded.md","x":1780,"y":1960,"width":400,"height":400,"color":"5"},
{"id":"6f61f61fbd369e9a","type":"file","file":"consensus-essence/src/list/raxos/def-Past-Future.md","x":720,"y":2200,"width":400,"height":400},
{"id":"ebae8b958fa8ef58","type":"file","file":"consensus-essence/src/list/raxos/example-Past-Future-distributed.md","x":1120,"y":3260,"width":400,"height":400,"color":"5"},
{"id":"23b7153aa3b72e6d","type":"file","file":"consensus-essence/src/list/raxos/example-Past-Future-single-threaded.md","x":1280,"y":2720,"width":400,"height":400,"color":"5"},
{"id":"23b19f5ffc793d95","type":"file","file":"consensus-essence/src/list/raxos/def-Apply.md","x":1002,"y":3,"width":400,"height":400},
{"id":"6d2f40704a61e31b","type":"file","file":"consensus-essence/src/list/raxos/def-Sub-History.md","x":-160,"y":1120,"width":400,"height":400},
{"id":"09a5660b835acde1","type":"file","file":"consensus-essence/src/list/raxos/def-Mergeable-History.md","x":-160,"y":1760,"width":400,"height":400},
{"id":"24e816cf21a582e0","type":"file","file":"consensus-essence/src/list/raxos/def-RW-Necessity.md","x":1184,"y":2060,"width":400,"height":400},
{"id":"cc1534fa18215622","type":"file","file":"consensus-essence/src/list/raxos/prop-Merge-Read.md","x":-160,"y":2980,"width":400,"height":400},
{"id":"11ad89cd35a2b1f6","type":"file","file":"consensus-essence/src/list/raxos/def-Distributed-HA.md","x":360,"y":1720,"width":400,"height":400},
{"id":"056ed53ecdd08e9a","type":"file","file":"consensus-essence/src/list/raxos/def-Distributed-Copies.md","x":120,"y":2200,"width":400,"height":400},
Expand All @@ -25,21 +20,31 @@
{"id":"97c0f9ab39142bd8","type":"file","file":"consensus-essence/src/list/raxos/prop-Write-Forbid-Smaller.md","x":-464,"y":4224,"width":400,"height":400},
{"id":"2de15b9927901fdf","type":"file","file":"consensus-essence/src/list/raxos/def-Read-Quorum-Set.md","x":640,"y":3778,"width":400,"height":400},
{"id":"75b048956a6d5a4f","type":"file","file":"consensus-essence/src/list/raxos/example-Quorum-Set.md","x":1480,"y":4260,"width":400,"height":400,"color":"5"},
{"id":"64f0cd59fe29aca7","type":"file","file":"consensus-essence/src/list/raxos/def-Write-Quorum-Set.md","x":640,"y":4360,"width":400,"height":400},
{"id":"8e2e4bdc350b67dd","type":"file","file":"consensus-essence/src/list/raxos/example-Quorum-Set-xy.md","x":1360,"y":4760,"width":400,"height":400,"color":"5"},
{"id":"eefaf5fc7ce43eba","type":"file","file":"consensus-essence/src/list/raxos/def-Observable-History.md","x":200,"y":4460,"width":400,"height":400},
{"id":"2aa96775efaf3504","x":0,"y":5100,"width":400,"height":400,"type":"file","file":"consensus-essence/src/list/raxos/def-T-Committed.md"},
{"id":"b33b510032aaea23","x":0,"y":5660,"width":400,"height":400,"type":"file","file":"consensus-essence/src/list/raxos/def-Committed.md"},
{"id":"4425d02ad7c436fc","x":1207,"y":5220,"width":400,"height":400,"color":"3","type":"file","file":"consensus-essence/src/list/raxos/desc-Availability.md"},
{"id":"e17d353216ad61b2","x":-464,"y":6120,"width":400,"height":400,"color":"6","type":"file","file":"consensus-essence/src/list/raxos/protocol-Write-Forbid-Smaller.md"},
{"id":"4fe24a286d42d5c1","x":0,"y":6400,"width":400,"height":400,"color":"6","type":"file","file":"consensus-essence/src/list/raxos/protocol-Write-After-Read.md"},
{"id":"64f51b278ec2e315","x":-720,"y":7800,"width":400,"height":400,"color":"6","type":"file","file":"consensus-essence/src/list/raxos/protocol-All.md"},
{"id":"04445468a553ca25","x":0,"y":7200,"width":400,"height":400,"color":"6","type":"file","file":"consensus-essence/src/list/raxos/protocol-Write-P2.md"},
{"id":"99fd7ad8ee77b8b7","x":-80,"y":8320,"width":400,"height":400,"color":"5","type":"file","file":"consensus-essence/src/list/raxos/example-Classic-Paxos.md"},
{"id":"d712fa28e2c9ebae","x":534,"y":8457,"width":400,"height":400,"type":"file","file":"consensus-essence/src/list/raxos/example-Raft.md"},
{"id":"a74a6a5cfa5647bc","x":-492,"y":9029,"width":400,"height":400,"type":"file","file":"consensus-essence/src/list/raxos/2d-consensus.md"},
{"id":"5d01e9e3355926b6","x":-297,"y":9810,"width":400,"height":390,"color":"5","type":"file","file":"consensus-essence/src/list/raxos/example-2d-consensus.md"},
{"id":"877084719c0840fa","x":-323,"y":10613,"width":400,"height":400,"type":"file","file":"consensus-essence/src/list/raxos/def-2d-consensus-Apply.md"}
{"id":"2aa96775efaf3504","type":"file","file":"consensus-essence/src/list/raxos/def-T-Committed.md","x":0,"y":5100,"width":400,"height":400},
{"id":"b33b510032aaea23","type":"file","file":"consensus-essence/src/list/raxos/def-Committed.md","x":0,"y":5660,"width":400,"height":400},
{"id":"4425d02ad7c436fc","type":"file","file":"consensus-essence/src/list/raxos/desc-Availability.md","x":1207,"y":5220,"width":400,"height":400,"color":"3"},
{"id":"4fe24a286d42d5c1","type":"file","file":"consensus-essence/src/list/raxos/protocol-Write-After-Read.md","x":0,"y":6400,"width":400,"height":400,"color":"6"},
{"id":"64f51b278ec2e315","type":"file","file":"consensus-essence/src/list/raxos/protocol-All.md","x":-720,"y":7800,"width":400,"height":400,"color":"6"},
{"id":"04445468a553ca25","type":"file","file":"consensus-essence/src/list/raxos/protocol-Write-P2.md","x":0,"y":7200,"width":400,"height":400,"color":"6"},
{"id":"99fd7ad8ee77b8b7","type":"file","file":"consensus-essence/src/list/raxos/example-Classic-Paxos.md","x":-80,"y":8320,"width":400,"height":400,"color":"5"},
{"id":"d712fa28e2c9ebae","type":"file","file":"consensus-essence/src/list/raxos/example-Raft.md","x":534,"y":8457,"width":400,"height":400,"color":"5"},
{"id":"a74a6a5cfa5647bc","type":"file","file":"consensus-essence/src/list/raxos/2d-consensus.md","x":-492,"y":9029,"width":400,"height":400},
{"id":"5d01e9e3355926b6","type":"file","file":"consensus-essence/src/list/raxos/example-2d-consensus.md","x":-297,"y":9810,"width":400,"height":390,"color":"5"},
{"id":"877084719c0840fa","type":"file","file":"consensus-essence/src/list/raxos/def-2d-consensus-Apply.md","x":-323,"y":10613,"width":400,"height":400},
{"id":"24e816cf21a582e0","type":"file","file":"consensus-essence/src/list/raxos/def-RW-Necessity.md","x":1184,"y":2120,"width":400,"height":400},
{"id":"6f61f61fbd369e9a","type":"file","file":"consensus-essence/src/list/raxos/def-Past-Future.md","x":1720,"y":2120,"width":400,"height":400},
{"id":"9c32627fede27385","type":"file","file":"consensus-essence/src/list/raxos/example-RW-single-threaded.md","x":2200,"y":1832,"width":400,"height":400,"color":"5"},
{"id":"23b7153aa3b72e6d","type":"file","file":"consensus-essence/src/list/raxos/example-Past-Future-single-threaded.md","x":2200,"y":2760,"width":400,"height":400,"color":"5"},
{"id":"ebae8b958fa8ef58","type":"file","file":"consensus-essence/src/list/raxos/example-Past-Future-distributed.md","x":2200,"y":3320,"width":400,"height":400,"color":"5"},
{"id":"64f0cd59fe29aca7","type":"file","file":"consensus-essence/src/list/raxos/def-Write-Quorum-Set.md","x":640,"y":4460,"width":400,"height":400},
{"id":"eefaf5fc7ce43eba","type":"file","file":"consensus-essence/src/list/raxos/def-Observable-History.md","x":160,"y":4460,"width":400,"height":400},
{"id":"f001ccd718f70347","x":640,"y":5160,"width":400,"height":400,"color":"3","type":"file","file":"consensus-essence/src/list/raxos/desc-History Read Set.md"},
{"id":"a838cacb9e50352d","x":236,"y":9019,"width":400,"height":400,"color":"5","type":"file","file":"consensus-essence/src/list/raxos/example-crdt.md"},
{"id":"c55a71983b5140b2","x":722,"y":9578,"width":400,"height":400,"color":"5","type":"file","file":"consensus-essence/src/list/raxos/example-2d-non-transitive-time.md"},
{"id":"6aca35d4bc6e348f","x":236,"y":10120,"width":400,"height":400,"color":"5","type":"file","file":"consensus-essence/src/list/raxos/exmaple-crdt-define.md"},
{"id":"e0499dbb83899c24","x":236,"y":10640,"width":400,"height":400,"color":"5","type":"file","file":"consensus-essence/src/list/raxos/example-crdt-implementation.md"},
{"id":"e17d353216ad61b2","type":"file","file":"consensus-essence/src/list/raxos/protocol-Write-Forbid-Smaller.md","x":-520,"y":6280,"width":400,"height":400,"color":"6"}
],
"edges":[
{"id":"ec72236e30d4ad00","fromNode":"6d2f40704a61e31b","fromSide":"bottom","toNode":"09a5660b835acde1","toSide":"top"},
Expand Down Expand Up @@ -87,6 +92,12 @@
{"id":"1d3e17e5ee6152bf","fromNode":"64f51b278ec2e315","fromSide":"bottom","toNode":"d712fa28e2c9ebae","toSide":"top"},
{"id":"f6c7441b6af0d177","fromNode":"64f51b278ec2e315","fromSide":"bottom","toNode":"a74a6a5cfa5647bc","toSide":"top"},
{"id":"f9c577da2f0570dd","fromNode":"a74a6a5cfa5647bc","fromSide":"bottom","toNode":"5d01e9e3355926b6","toSide":"top"},
{"id":"fe61a48ba3c61f1e","fromNode":"5d01e9e3355926b6","fromSide":"bottom","toNode":"877084719c0840fa","toSide":"top"}
{"id":"fe61a48ba3c61f1e","fromNode":"5d01e9e3355926b6","fromSide":"bottom","toNode":"877084719c0840fa","toSide":"top"},
{"id":"065998b773a58476","fromNode":"eefaf5fc7ce43eba","fromSide":"bottom","toNode":"f001ccd718f70347","toSide":"top"},
{"id":"ea0a374d1e541e54","fromNode":"64f51b278ec2e315","fromSide":"bottom","toNode":"a838cacb9e50352d","toSide":"top"},
{"id":"e56beb199a811c22","fromNode":"a838cacb9e50352d","fromSide":"bottom","toNode":"c55a71983b5140b2","toSide":"top"},
{"id":"b6b5fdb98e3ed393","fromNode":"a838cacb9e50352d","fromSide":"bottom","toNode":"6aca35d4bc6e348f","toSide":"top"},
{"id":"51d037ce1e9e19b8","fromNode":"c55a71983b5140b2","fromSide":"bottom","toNode":"6aca35d4bc6e348f","toSide":"top"},
{"id":"37c8ac0a99460eca","fromNode":"6aca35d4bc6e348f","fromSide":"bottom","toNode":"e0499dbb83899c24","toSide":"top"}
]
}
Binary file added src/list/raxos/assets/crdt-storage.excalidraw.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added src/list/raxos/assets/crdt-time.excalidraw.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added src/list/raxos/assets/crdt.excalidraw.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
File renamed without changes
File renamed without changes
File renamed without changes
File renamed without changes
Binary file removed src/list/raxos/crdt.excalidraw.png
Binary file not shown.
6 changes: 4 additions & 2 deletions src/list/raxos/def-API-read-write.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,10 @@

这个由虚拟时间Time, 事件Event 和 History 的系统中, 读操作和写操作都定义为在一个指定时间上发生的行为:
```rust
fn read(t: Time) -> History({T});
fn write(h: History({T}));
impl Node {
fn read(&self, t: Time) -> History({T});
fn write(&mut self, h: History({T}));
}
```

其中, `read` 读系统中t时刻的历史, 即它应该返回所有t时刻之前发生的Event, read返回的History的最大时刻可能小于参数中的t.
Expand Down
10 changes: 5 additions & 5 deletions src/list/raxos/def-Observable-History.md
Original file line number Diff line number Diff line change
@@ -1,18 +1,18 @@
**def-Observable**
## 对系统总是可见的 History

对于某个时刻`pt`,
对一个 History, 如果这个时刻的 `read_quorum_set` 中任意一个`read_quorum` 都能读到这个 History,
那么就认为这个 History 是 **`pt`时刻可见** 的.
对于某个时刻`T`,
对一个 History, 如果这个时刻的 `read_quorum_set` 中每个`read_quorum` 都能读到这个 History,
那么就认为这个 History 是 **`T`时刻可见** 的.

注意 **可见** 是跟系统的状态和 `read_quorum_set` 的定义相关的,
例如, 下图中对 `History = {E1, E2}`来说,
例如, 下图中对 `History = {E1->E2}`来说,
如果 `read_quorum_set` 是多数派读写的定义, 即 `{{1,2},{1,3},{2,3}}`,
那么它 **可见** 的:

![](history-visible-12.excalidraw.png)

但对 `History = {E1, E2, E3}`来说,
但对 `History = {E1->E2->E3}`来说,
- 如果 `read_quorum_set` 仍然是多数派读写的定义, 即 `{{1,2},{1,3},{2,3}}`,
那么它不是 **可见** 的, 因为通过`{2,3}`读不到`History = {E1,E2,E3}`,
- 但如果修改 `read_quorum_set``{{1,2},{1,3}}`, 那么即使系统状态不变,
Expand Down
20 changes: 19 additions & 1 deletion src/list/raxos/def-RW-Distributed.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,25 @@
# 分布式多副本中的读写


写操作要写一个历史,即 `fn write(h: History({T}), nodes: Vec<Node>)`, 对每个节点的写的内容都一样
分布式中的读和写都是基于我们单机的API读写来完成的 我们首先看分布式的写 分布式中的写真的比较简单 它只需要把一个要写的history 写到每一个节点上就行了 就像我们下面的代码所展示的那样 分布式里面的读相对会稍微复杂一点 你读的话也是从每个副本上读一个history过来 然后再通过这些读到的结果 然后返回给客户调用者一个结果

```rust
impl {
fn write(&self, h: History({T})) {
for node in self.get_write_quorum() {
node.write(h);
}
}
}
```







写操作要写一个历史,即 `fn write(h: History({T}), nodes: &[Node])`, 对每个节点的写的内容都一样

直观上, 分布式环境中的读写变成了对多个节点的读写:
```rust
Expand Down
18 changes: 12 additions & 6 deletions src/list/raxos/def-Read-Quorum-Set.md
Original file line number Diff line number Diff line change
@@ -1,24 +1,30 @@
## Read Quorum Set

对每个系统, 不论是单机的还是分布式的,
都显式的或隐含的定义了合法的 `read()` 可用的 `node_set` 有哪些, 系统只有用这样的`node_set` 读的时候才能提供 TODO:
都显式的或隐含的定义了可用于 `read()` `node_set` 有哪些, 系统只有用这样的`node_set` 读的时候才能提供某种保证, 使用其他node set读取到的结果不提供任何保证:

- 例如单机系统, `read()` 可用的 `node_set` 就是唯一这个节点`{{1}}`,
显然用一个空的 `node_set` 去读是没意义的.

- 一个简单 3 节点系统中, 如果不做任何限制,
那么`read()`可用的`node_set`是所有非空节点集合: `{{1}, {2}, {3}, {1,2}, {2,3}, {1,3}, {1,2,3}}`
但注意这样一个系统中`read()`得到的结果一般是没有任何高可用保证的, 因为它暗示了写操作必须写到每一个 node 上才能被合法的读读到.
但注意这样一个系统中`read()`得到的结果一般是没有任何高可用保证的, 因为它暗示了一个写操作必须写到每一个 node 上才能被合法的读读到.

- 一个多数派读写的 3 节点系统中(n=3, w=2, r=2),
- 一个 [多数派读写][] 3 节点系统中(n=3, w=2, r=2),
`read()`可用的`node_set`是至少包含 2 节点的集合: `{{1,2}, {2,3}, {1,3}, {1,2,3}}`,

如果一个 read 操作使用的 `node_set` 是这个系统定义的用于读的`node_set`,
那么认为这个 read 操作是合法的, 系统只给合法的读操作提供保证,
对于不合法的 read 操作,
系统对读取的结果不能提供任何保证(undefined behavior).

**def-Read-Quorum-Set** **def-Read-Quorum**:
在某个时刻`pt`, 可合法的用于`read()``node_set`的集合, 就是系统这个`pt`时刻的`read_quorum_set`,
**def-Read-Quorum-Set**
**def-Read-Quorum**:

在某个时刻`T`, 可合法的用于`read()``node_set`的集合, 就是系统在这个`T`时刻的`read_quorum_set(T)`, 注意不同时间T的`read_quorum_set(T)`可能不同, 例如 [Raft][] 的配置变更时.

`read_quorum_set` 中的一个元素是一个节点集合, 称之为一个`read_quorum`.
`read_quorum` 是一个节点集合`node_set`, `read_quorum_set` 是一个 节点集合的集合.


[多数派读写]: http://a
[Raft]: http://a
Binary file removed src/list/raxos/history-visible-12.excalidraw.png
Binary file not shown.
65 changes: 7 additions & 58 deletions src/list/raxos/raxos.md
Original file line number Diff line number Diff line change
Expand Up @@ -34,48 +34,7 @@ TODO: remove concept History quorum



## History Read Set

对于
`fn read(pt: PTime, nodes: Vec<Node>) -> History`,
对读到的 History 的任一子集 H, 我们可以称这个`node_set` 是这个`History`的一个 ReadSet. 表示这个`History` 可以通过这个`node_set`读到.

对系统的某个特定的状态, Histroy 的 ReadSet 定义为可以读到这个 History 的一个节点的集合.

例如, 在下面,

- `read({1})` 返回: `[History{E1, E2, E3}]`,
- `read({1, 2})` 返回: `[History{E1, E2, E3}, History{E1, E2}]`,
- `read({3})` 返回空 `ø`.

例如, 在下面这个 3 节点的系统中, `History{E1,E2,E3}`的 ReadSet, 即能读到它的节点的集合, 有 4 个, 是所有包括 node-1 节点的节点集合:

`{1}, {1,2}, {1,2,3}, {1,3}`

例如`read({1})` 会返回`History{E1,E2,E3}`在结果里,
`read({1,3})` 也会返回`History{E1,E2,E3}`在结果里.

但是`read({3})` 不会返回`History{E1,E2,E3}`, 所以 `{3}` 不是`History{E1,E2,E3}` 的一个 ReadSet

`History{E1,E2}`的 ReadSet 有 5 个, 除了`{3}` 之外的所有非空节点集合都是它的 ReadSet:
`{1}, {1,2}, {1,2,3}, {1,3}, {2,3}`,

例如`read({2,3})` 会返回`History{E1,E2}`在结果里.

![](history-read-set.excalidraw.png)

对于返回的结果, 我们也可以将 `Vec<History>`里的元素做一个并集来简化表示,
例如, 上图中, `read({1, 2})` 可以看做返回了一个 History: `History{E1, E2, E3}`

而在下图中, 我们可以认为`read({1, 2})` 返回了一个树形的 History:

```
.->E4
E1->E2->E3
```

![](history-read-set-union.excalidraw.png)

[[desc-History Read Set]]

[[def-Read-Quorum-Set]]

Expand Down Expand Up @@ -237,6 +196,8 @@ node-1 本地将已有 History 跟新写入的 History 合并了, 本地存储

![](linear-rw.excalidraw.png)

protocol-write-prepare

## Write Prepare

现在 write 保证了不覆盖已有的 History, 且写到了`write_quorum_set`,
Expand Down Expand Up @@ -305,30 +266,18 @@ node-1 本地将已有 History 跟新写入的 History 合并了, 本地存储
我举这个例子是为了说明, Time 的每个维度之间的类型不能假设是一样的,
即不支持加法, 但可以支持乘法. 这将会引出后面的有趣的结论.

[[example-crdt]]

# 多维偏序时间应用例子
[[example-2d-non-transitive-time]]

假设有一个 key value 的存储系统,在这个系统里面,每个 key 都代表一个不同的维度。然后我们的时间,就是一个多维的向量时间内的比较,就是多为向量的比较。

我们先从一个简单的场景来看, 假设这个系统里只有2个key:
x 和 y,
这个系统的虚拟时间`T`定义为: 一个操作涉及的key的对应的维度上值为i的向量.
例如, `let x = 2` 的T为`{x:i}`,
`let y = 3`的T为`{y:i}`;
`let x = x + y`的T为`{x:i, y:i}`
T的比较关系定义为公共分量的大小比较:
例如, `{x:1} < {x:2,y:2} < {y:3}`, 但`{x:1}` `{y:2}` 之间没有大小关系, 因为它们没有公共分量.
这个虚拟时间定义了哪些Event可以互补影响的执行, 哪些之间必须有确定的顺序, 以及这些顺序是什么(有`i` 定义).
[[exmaple-crdt-define]]

[[example-crdt-implementation]]

![](crdt.excalidraw.png)

应用我们的Abstract Paxos到这个系统, 它将构建一个冲突自适应的Multi Master一致性协议.

例如

注意, 我们时间是对应到DAG的结构的, `{x:2,y:2} < {y:3}` 中的 `{y:3}` 顶点和 一个独立的 `{y:3}` 顶点代表的不是一个时间.

如果T不是**传递的**, 那么第一阶段必须在一个read-quorum-set上阻止要写入的History上所有的T, 而不是只有最大的一个T.因为不能通过T来防止之前的顶点被写入.
而在第二阶段, 也必须逐个判断要写入的History的每个T是否小于已经被阻止的T, 而不仅仅是要写入的History的Maximal T.

Expand Down

0 comments on commit 3ed1102

Please sign in to comment.