typos

godaai · Feb 15, 2024 · 0afeac6 · 0afeac6
1 parent a882103
commit 0afeac6
Show file tree

Hide file tree

Showing 10 changed files with 66 additions and 87 deletions.
diff --git a/_static/custom.css b/_static/custom.css
@@ -0,0 +1,3 @@
+html[data-theme="light"] {
+ --sbt-color-announcement: rgb(125, 125, 125);
+}
diff --git a/ch-ray-core/ray-intro.md b/ch-ray-core/ray-intro.md
@@ -17,12 +17,12 @@ Ray Core 提供的 API 将 Python 任务横向扩展到集群上，最关键的
 * 行动者（Actor）：面向类（Class）的接口，用于定义一个类，该类可以在集群中分布式地执行。
 * 对象（Object）：分布式的对象，对象不可变（Immutable），用于在 Task 和 Actor 之间传递数据。
 
+上层的各类生态均基于 Ray Core 的这些底层 API，结合各类人工智能应用编写而成。
+
 ```{figure} ../img/ch-ray-core/ray-apis.svg
 ---
 width: 800px
 name: ray-core-apis
 ---
 Ray Core 核心 API
-```
-
-上层的各类生态均基于 Ray Core 的这些底层 API，结合各类人工智能应用编写而成。
+```
diff --git a/ch-ray-core/remote-class.ipynb b/ch-ray-core/remote-class.ipynb
@@ -12,7 +12,7 @@
  "\n",
  "{numref}`remote-function` 展示了如何将一个无状态的函数扩展到 Ray 集群上进行分布式计算，但实际的场景中，我们经常需要进行有状态的计算。最简单的有状态计算包括维护一个计数器，每遇到某种条件，计数器加一。这类有状态的计算对于给定的输入，不一定得到确定的输出。单机场景我们可以使用 Python 的类（Class）来实现，计数器可作为类的成员变量。Ray 可以将 Python 类拓展到集群上，即远程类（Remote Class），又被称为行动者（Actor）。Actor 的名字来自 Actor 编程模型 {cite}`hewitt1973Universal` ，这是一个典型的分布式计算编程模型，被广泛应用在大数据和人工智能领域，但 Actor 编程模型比较抽象，我们先从计数器的案例来入手。\n",
  "\n",
- "## 案例1：分布式计数器\n"
+ "## 案例1：分布式计数器"
  ]
  },
  {
@@ -38,7 +38,7 @@
  {
  "data": {
  "application/vnd.jupyter.widget-view+json": {
- "model_id": "c873d22011234552a5da4bb3bfc9f76f",
+ "model_id": "05fad1c7643f4f028f2ee9cb6097f749",
  "version_major": 2,
  "version_minor": 0
  },
@@ -62,24 +62,20 @@
  " <table class=\"jp-RenderedHTMLCommon\" style=\"border-collapse: collapse;color: var(--jp-ui-font-color1);font-size: var(--jp-ui-font-size1);\">\n",
  " <tr>\n",
  " <td style=\"text-align: left\"><b>Python version:</b></td>\n",
- " <td style=\"text-align: left\"><b>3.10.13</b></td>\n",
+ " <td style=\"text-align: left\"><b>3.11.7</b></td>\n",
  " </tr>\n",
  " <tr>\n",
  " <td style=\"text-align: left\"><b>Ray version:</b></td>\n",
- " <td style=\"text-align: left\"><b>2.7.0</b></td>\n",
+ " <td style=\"text-align: left\"><b>2.9.0</b></td>\n",
  " </tr>\n",
- " <tr>\n",
- " <td style=\"text-align: left\"><b>Dashboard:</b></td>\n",
- " <td style=\"text-align: left\"><b><a href=\"http://127.0.0.1:8265\" target=\"_blank\">http://127.0.0.1:8265</a></b></td>\n",
- "</tr>\n",
- "\n",
+ " \n",
  "</table>\n",
  "\n",
  " </div>\n",
  "</div>\n"
  ],
  "text/plain": [
- "RayContext(dashboard_url='127.0.0.1:8265', python_version='3.10.13', ray_version='2.7.0', ray_commit='b4bba4717f5ba04ee25580fe8f88eed63ef0c5dc', protocol_version=None)"
+ "RayContext(dashboard_url='', python_version='3.11.7', ray_version='2.9.0', ray_commit='9be5a16e3ccad0710bba08d0f75e9ff774ae6880', protocol_version=None)"
  ]
  },
  "execution_count": 1,
@@ -104,7 +100,7 @@
  "origin_pos": 2
  },
  "source": [
- "Ray 的 Remote Class 也使用 [`ray.remote()`](https://docs.ray.io/en/latest/ray-core/api/doc/ray.remote.html) 来装饰。\n"
+ "Ray 的 Remote Class 也使用 [`ray.remote()`](https://docs.ray.io/en/latest/ray-core/api/doc/ray.remote.html) 来装饰。"
  ]
  },
  {
@@ -145,7 +141,7 @@
  "origin_pos": 4
  },
  "source": [
- "使用 Ray 创建一个类名为 `Counter` 的 Remote Class，需要在类名 `Counter` 后面加上 `remote()`。这样创建的类就是一个分布式的 Actor。\n"
+ "初始化一个实例，在类名 `Counter` 后面加上 `remote()`，即创建一个分布式的 Actor。"
  ]
  },
  {
@@ -176,7 +172,7 @@
  "origin_pos": 6
  },
  "source": [
- "接下来我们要使用 `Counter` 类的计数功能：`increment()` 函数，我们也要在函数后面添加 `remote()` ，即 `对象实例.函数名.remote()`。\n"
+ "接下来我们要使用 `Counter` 类的计数功能：`increment()` 函数，我们也要在函数后面添加 `remote()` ，即 `对象实例.函数名.remote()`。"
  ]
  },
  {
@@ -216,7 +212,7 @@
  "origin_pos": 8
  },
  "source": [
- "我们可以用同一个类创建不同的 Actor 实例，不同 Actor 之间的成员函数调用可以被并行化执行，但同一个 Actor 的成员函数调用是顺序执行的。\n"
+ "我们可以用同一个类创建不同的 Actor 实例，不同 Actor 之间的成员函数调用可以被并行化执行，但同一个 Actor 的成员函数调用是顺序执行的。"
  ]
  },
  {
@@ -236,13 +232,6 @@
  ]
  },
  "outputs": [
- {
- "name": "stderr",
- "output_type": "stream",
- "text": [
- "\u001b[2m\u001b[33m(raylet)\u001b[0m [2023-10-31 17:16:11,991 E 64054 59177832] (raylet) file_system_monitor.cc:111: /tmp/ray/session_2023-10-31_17-15-57_239394_63766 is over 95% full, available space: 49435729920; capacity: 1000240963584. Object creation will fail if spilling is required.\n"
- ]
- },
  {
  "name": "stdout",
  "output_type": "stream",
@@ -329,7 +318,7 @@
  "\n",
  "## 案例2：排行榜\n",
  "\n",
- "接下来我们基于 Actor 实现一个更加复杂的案例：成绩排行榜。这个排行榜的状态是一个键值对，名为 `self.board`，键是名字（`name`），是一个 `str` 类型，值是分数（`score`），是一个 `float` 类型。\n"
+ "接下来我们基于 Actor 实现一个更加复杂的案例：成绩排行榜。这个排行榜的状态是一个键值对，名为 `self.board`，键是名字（`name`），是一个 `str` 类型，值是分数（`score`），是一个 `float` 类型。"
  ]
  },
  {
@@ -397,7 +386,7 @@
  "* `add()`：添加一条新记录，同时对输入进行解析，如果 `score` 不能转换成 `float` 会抛出异常；并对已有记录排序。\n",
  "* `pop()`：删除最大值的那条记录，如果 `self.board` 为空，会抛出异常。\n",
  "\n",
- "使用 `.remote()` 函数来创建这个 Remote Class 对应的 Actor 实例。\n"
+ "使用 `.remote()` 函数来创建这个 Remote Class 对应的 Actor 实例。"
  ]
  },
  {
@@ -431,7 +420,7 @@
  "source": [
  "这里的 `ranking` 是一个 Actor 的引用（Actor Handle），有点像 `ObjectRef`，我们用 `ranking` 这个 Actor Handle 来管理这个 Actor。一旦 Actor Handle 被销毁，对应的 Actor 以及其状态也被销毁。\n",
  "\n",
- "我们可以创建多个 Actor 实例，每个实例管理自己的状态。还可以用 [`ActorClass.options`](https://docs.ray.io/en/latest/ray-core/api/doc/ray.actor.ActorClass.options.html) 给这些 Actor 实例设置一些选项，起名字，设置 CPU、GPU 计算资源等。\n"
+ "我们可以创建多个 Actor 实例，每个实例管理自己的状态。还可以用 [`ActorClass.options`](https://docs.ray.io/en/latest/ray-core/api/doc/ray.actor.ActorClass.options.html) 给这些 Actor 实例设置一些选项，起名字，设置 CPU、GPU 计算资源等。"
  ]
  },
  {
@@ -468,7 +457,7 @@
  "origin_pos": 18
  },
  "source": [
- "有了名字之后，就可以通过 [`ray.get_actor()`](https://docs.ray.io/en/latest/ray-core/api/doc/ray.get_actor.html) 来获取 Actor Handle，\n"
+ "有了名字之后，就可以通过 [`ray.get_actor()`](https://docs.ray.io/en/latest/ray-core/api/doc/ray.get_actor.html) 来获取 Actor Handle，"
  ]
  },
  {
@@ -500,7 +489,7 @@
  "origin_pos": 20
  },
  "source": [
- "向 `ranking` 排行榜内添加新记录，即调用 `add()` 函数。调用类成员函数，都要记得加上 `.remote()` ，否则会报错。\n"
+ "向 `ranking` 排行榜内添加新记录，即调用 `add()` 函数。调用类成员函数，都要记得加上 `.remote()` ，否则会报错。"
  ]
  },
  {
@@ -575,7 +564,7 @@
  "origin_pos": 23
  },
  "source": [
- "在上面的案例中，有些调用会引发异常，比如插入一个字符串，Ray 通常会处理异常并打印出来，但是为了保险起见，你也可以在调用这些 Remote Class 的成员方法时手动做好 `try/except` 的异常捕获：\n"
+ "在上面的案例中，有些调用会引发异常，比如插入一个字符串，Ray 通常会处理异常并打印出来，但是为了保险起见，你也可以在调用这些 Remote Class 的成员方法时手动做好 `try/except` 的异常捕获："
  ]
  },
  {
@@ -588,25 +577,11 @@
  "name": "stdout",
  "output_type": "stream",
  "text": [
- "\u001b[2m\u001b[36m(Ranking pid=53871)\u001b[0m The data type of score should be float but we receive <class 'str'>.\n",
- "\u001b[36mray::Ranking.pop()\u001b[39m (pid=53871, ip=127.0.0.1, actor_id=b765fed53d3f5536b2056cd901000000, repr=<__main__.Ranking object at 0x7fcb041eb730>)\n",
- " File \"/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_53185/1570506600.py\", line 29, in pop\n",
- "Exception: The board is empty.\n"
- ]
- },
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "\u001b[2m\u001b[36m(Ranking pid=64481)\u001b[0m The data type of score should be float but we receive <class 'str'>.\n"
- ]
- },
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "\u001b[36mray::Ranking.pop()\u001b[39m (pid=64481, ip=127.0.0.1, actor_id=b0dd04160e75747cf6ebc7de01000000, repr=<__main__.Ranking object at 0x7fb6f9f92c20>)\n",
- " File \"/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_63766/1570506600.py\", line 29, in pop\n",
+ "\u001b[36m(Ranking pid=94276)\u001b[0m The data type of score should be float but we receive <class 'str'>.\n",
+ "\u001b[36mray::Ranking.pop()\u001b[39m (pid=94276, ip=127.0.0.1, actor_id=ad9e8acb97292b765b95c42501000000, repr=<__main__.Ranking object at 0x10348d690>)\n",
+ " ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n",
+ " ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^\n",
+ " File \"/var/folders/4n/v40br47s46ggrjm9bdm64lwh0000gn/T/ipykernel_94239/1570506600.py\", line 29, in pop\n",
  "Exception: The board is empty.\n"
  ]
  }
@@ -627,7 +602,7 @@
  "source": [
  "## 案例3：Actor Pool\n",
  "\n",
- "实践上，经常创建一个 Actor 资源池（Actor Pool），[Actor Pool](https://docs.ray.io/en/latest/ray-core/api/doc/ray.util.ActorPool.html) 有点像 `multiprocessing.Pool`，Actor Pool 中有包含多个 Actor，每个 Actor 功能一样，而且可以分式地在多个计算节点上运行。"
+ "实践上，经常创建一个 Actor 资源池（Actor Pool），[`ActorPool`](https://docs.ray.io/en/latest/ray-core/api/doc/ray.util.ActorPool.html) 有点像 `multiprocessing.Pool`，Actor Pool 中有包含多个 Actor，每个 Actor 功能一样，而且可以分式地在多个计算节点上运行。"
  ]
  },
  {
@@ -672,9 +647,9 @@
  "origin_pos": 25
  },
  "source": [
- "如果我们想调用加入到 ActorPool 中的 Actor，可以使用 [`map(fn, values)`](https://docs.ray.io/en/latest/ray-core/api/doc/ray.util.ActorPool.map.html) 和 [`submit(fn, value)`](https://docs.ray.io/en/latest/ray-core/api/doc/ray.util.ActorPool.submit.html) 方法。这两个方法非常相似，所接收的参数是一个函数 `fn` 和参数 `value` 或者参数列表 `values`。`map()` 的 `values` 是一个列表，让函数并行地分发给多个 Actor 去处理；`submit()` 的 `value` 是单个值，每次从 ActorPool 中选择一个 Actor 去执行。`fn` 是一个 Lambda 表达式，或者说是一个匿名函数。这个 Lambda 表达式有两个参数：`actor` 和 `value`，`actor` 就是我们定义的单个 Actor 的函数调用，`value` 是这个函数的参数。\n",
+ "如果我们想调用加入到 `ActorPool` 中的 Actor，可以使用 [`map(fn, values)`](https://docs.ray.io/en/latest/ray-core/api/doc/ray.util.ActorPool.map.html) 和 [`submit(fn, value)`](https://docs.ray.io/en/latest/ray-core/api/doc/ray.util.ActorPool.submit.html) 方法。这两个方法非常相似，所接收的参数是一个函数 `fn` 和参数 `value` 或者参数列表 `values`。`map()` 的 `values` 是一个列表，让函数并行地分发给多个 Actor 去处理；`submit()` 的 `value` 是单个值，每次从 `ActorPool` 中选择一个 Actor 去执行。`fn` 是一个 Lambda 表达式，或者说是一个匿名函数。这个 Lambda 表达式有两个参数：`actor` 和 `value`，`actor` 就是我们定义的单个 Actor 的函数调用，`value` 是这个函数的参数。\n",
  "\n",
- "函数的第一个参数是 ActorPool 中的 Actor，第二个参数是函数的参数。\n"
+ "函数的第一个参数是 `ActorPool` 中的 Actor，第二个参数是函数的参数。"
  ]
  },
  {
@@ -708,7 +683,7 @@
  "origin_pos": 27
  },
  "source": [
- "`map()` 和 `submit()` 将计算任务提交到了 ActorPool 中，ActorPool 并不是直接返回结果，而是异步地分发给后台不同的 Actor 去执行。需要使用 [`get_next()`](https://docs.ray.io/en/latest/ray-core/api/doc/ray.util.ActorPool.get_next.html) 阻塞地返回结果。\n"
+ "`map()` 和 `submit()` 将计算任务提交到了 `ActorPool` 中，`ActorPool` 并不是直接返回结果，而是异步地分发给后台不同的 Actor 去执行。需要使用 [`get_next()`](https://docs.ray.io/en/latest/ray-core/api/doc/ray.util.ActorPool.get_next.html) 阻塞地返回结果。"
  ]
  },
  {
@@ -756,7 +731,7 @@
  "source": [
  "当然，如果已经把所有结果都取回，仍然再去 `get_next()`，将会抛出异常。\n",
  "\n",
- "在这里，`value` 只能是单个对象，不能是参数列表，如果想传入多个参数，可以把参数包裹成元组。比如 `add()` 方法对两个操作数做计算，我们把两个操作数包裹为一个元组，实现 `add()` 函数时使用 `(a, b) = operands` 解析这个元组。\n"
+ "在这里，`submit()` 的 `value` 参数只能是单个对象，不能是参数列表，如果想传入多个参数，可以把参数包裹成元组。比如 `add()` 方法对两个操作数做计算，我们把两个操作数包裹为一个元组，实现 `add()` 函数时使用 `(a, b) = operands` 解析这个元组。"
  ]
  },
  {
@@ -839,7 +814,7 @@
  "name": "python",
  "nbconvert_exporter": "python",
  "pygments_lexer": "ipython3",
- "version": "3.10.13"
+ "version": "3.11.7"
  },
  "required_libs": []
  },

diff --git a/ch-ray-core/remote-object.ipynb b/ch-ray-core/remote-object.ipynb
@@ -12,7 +12,7 @@
  "\n",
  "Ray 分布式计算中涉及共享数据可被放在分布式对象存储（Distributed Ojbect Store）中，被放置在分布式对象存储中的数据被称为远程对象（Remote Object）中。我们可以使用 [`ray.get()`](https://docs.ray.io/en/latest/ray-core/api/doc/ray.get.html) 和 [`ray.put()`](https://docs.ray.io/en/latest/ray-core/api/doc/ray.put.html) 读写这些 Remote Object。与内存中的 Python 对象实例不同，Remote Object 是不可原地直接更改的（Immutable）。\n",
  "\n",
- "## ray.put() 与 ray.get()\n"
+ "## `ray.put()` 与 `ray.get()`\n"
  ]
  },
  {
@@ -119,8 +119,8 @@
  "width: 800px\n",
  "name: put-get-object-store\n",
  "---\n",
- "RAY 分布式对象存储示意图\n",
- "```\n"
+ "Ray 分布式对象存储示意图\n",
+ "```"
  ]
  },
  {
@@ -156,7 +156,7 @@
  " return torch.randn(size=(size), dtype=torch.float)\n",
  "\n",
  "torch.manual_seed(42)\n",
- "# 创建 16个 个机张量，每个张量大小为 (X, 8, 8)\n",
+ "# 创建 16 个张量，每个张量大小为 (X, 8, 8)\n",
  "tensor_obj_ref_list = [ray.put(create_rand_tensor((i, 8, 8))) for i in range(1, 16)]\n",
  "tensor_obj_ref_list[0], len(tensor_obj_ref_list)"
  ]
@@ -227,7 +227,7 @@
  "origin_pos": 6
  },
  "source": [
- "或者把存放 `ObjectRefIDs` 列表的所有对象都拉取过来：\n"
+ "或者把 `ObjectRefID` 列表的所有对象都拉取过来："
  ]
  },
  {
@@ -286,9 +286,9 @@
  "origin_pos": 8
  },
  "source": [
- "## 案例1：对数据进行转换\n",
+ "## Example 1: Transforming Data\n",
  "\n",
- "Remote Object 的数据是不可原地更改的，比如下面的操作在单机的内存上可以，但是在 Remote Object 上，不可以直接在原地对 Remote Object 做更改。\n"
+ "The data of remote objects is immutable. For example, the following operation is common in the local memory but cannot be directly applied to a remote object."
  ]
  },
  {
@@ -339,7 +339,7 @@
  "origin_pos": 10
  },
  "source": [
- "如果我们想使用新数据，应该使用 Remote Function 或者 Remote Class 对 Remote Object 进行转换操作，生成新的 Remote Object。\n"
+ "如果我们想使用新数据，应该使用 Remote Function 或者 Remote Class 对 Remote Object 进行转换操作，生成新的 Remote Object。"
  ]
  },
  {
@@ -392,7 +392,7 @@
  "\n",
  "### 直接传递\n",
  "\n",
- "直接在 Task 或者 Actor 的函数调用时将 `RefObjectID` 作为参数传递进去。在下面这个例子中，`x_obj_ref` 是一个 `RefObjectID` ，`echo()` 这个 Remote Function 将自动从 `x_obj_ref` 获取 `x` 的值。这个自动获取值的过程被称为自动反引用（De-referenced）。\n"
+ "直接在 Task 或者 Actor 的函数调用时将 `RefObjectID` 作为参数传递进去。在下面这个例子中，`x_obj_ref` 是一个 `RefObjectID` ，`echo()` 这个 Remote Function 将自动从 `x_obj_ref` 获取 `x` 的值。这个自动获取值的过程被称为自动反引用（De-referenced）。"
  ]
  },
  {
@@ -517,7 +517,7 @@
  "如果 `RefObjectID` 被包裹在一个复杂的数据结构中，Ray 并不会自动获取 `RefObjectID` 对应的值，即 De-referenced 并不是自动的。复杂数据结构包括：\n",
  "\n",
  "* `RefObjectID` 被包裹在一个 `dict` 中，比如：`.remote({\"obj\": x_obj_ref})`\n",
- "* `RefObjectID` 被包裹在一个 `list` 中，比如：`.remote([x_obj_ref])`\n"
+ "* `RefObjectID` 被包裹在一个 `list` 中，比如：`.remote([x_obj_ref])`"
  ]
  },
  {
@@ -639,7 +639,7 @@
  "\n",
  "Ray 集群的每个计算节点都有一个基于共享内存的对象存储， Remote Object 的数据会存储在集群某个或者某些计算节点的对象存储中，所有计算节点的共享内存共同组成了分布式对象存储。\n",
  "\n",
- "当某个 Remote Object 的数据量较小时（<= 100 KB），它会被存储在计算节点进程内存中；当数据量较大时，它会被存储在分布式的共享内存中；当集群的共享内存的空间不够时，数据会被外溢（Spill）到持久化的存储上，比如硬盘或者S3。\n"
+ "当某个 Remote Object 的数据量较小时（<= 100 KB），它会被存储在计算节点进程内存中；当数据量较大时，它会被存储在分布式的共享内存中；当集群的共享内存的空间不够时，数据会被外溢（Spill）到持久化的存储上，比如硬盘或者S3。"
  ]
  },
  {