Add agent servers to support running any type of agent in distributio…

…n mode (#135)
modelscope · Apr 30, 2024 · c431015 · c431015
1 parent 9d23e5b
commit c431015
Show file tree

Hide file tree

Showing 30 changed files with 1,682 additions and 340 deletions.
diff --git a/README.md b/README.md
@@ -126,6 +126,7 @@ the following libraries.
   - [Distributed Conversation](./examples/distributed_basic)
   - [Distributed Debate](./examples/distributed_debate)
   - [Distributed Parallel Search](./examples/distributed_search)
+  - [Distributed Large Scale Simulation](./examples/distributed_simulation)
 
 More models, services and examples are coming soon!
 

diff --git a/README_ZH.md b/README_ZH.md
@@ -115,6 +115,7 @@ AgentScope支持使用以下库快速部署本地模型服务。
   - [分布式对话](./examples/distributed_basic)
   - [分布式辩论](./examples/distributed_debate)
   - [分布式并行搜索](./examples/distributed_search)
+  - [分布式大规模仿真](./examples/distributed_simulation)
 
 更多模型API、服务和示例即将推出！
 

diff --git a/docs/sphinx_doc/en/source/conf.py b/docs/sphinx_doc/en/source/conf.py
@@ -49,6 +49,11 @@
 
 autodoc_member_order = "bysource"
 
+autodoc_default_options = {
+    "members": True,
+    "special-members": "__init__",
+}
+
 # Add any paths that contain templates here, relative to this directory.
 templates_path = ["_templates"]
 

diff --git a/docs/sphinx_doc/en/source/tutorial/201-agent.md b/docs/sphinx_doc/en/source/tutorial/201-agent.md
@@ -16,6 +16,8 @@ Each AgentBase derivative is composed of several key characteristics:
 
 * `sys_prompt` & `engine`: The system prompt acts as predefined instructions that guide the agent in its interactions; and the `engine` is used to dynamically generate a suitable prompt. For more details about them, we defer to [Prompt Engine](206-prompt).
 
+* `to_dist`: Used to create a distributed version of the agent, to support efficient collaboration among multiple agents. Note that `to_dist` is a reserved field and will be automatically added to the initialization function of any subclass of `AgentBase`. For more details about `to_dist`, please refer to [Distribution](208-distribute).
+
 In addition to these attributes, `AgentBase` endows agents with pivotal methods such as `observe` and `reply`:
 
 * `observe()`: Through this method, an agent can take note of *message* without immediately replying, allowing it to update its memory based on the observed *message*.

diff --git a/docs/sphinx_doc/en/source/tutorial/207-monitor.md b/docs/sphinx_doc/en/source/tutorial/207-monitor.md
@@ -35,8 +35,10 @@ Get a monitor instance from `MonitorFactory` to begin monitoring, and note that
 monitor = MonitorFactory.get_monitor()
 ```
 
-> Currently the above code returns a `SqliteMonitor` instance, which is initialized in `agentscope.init`.
-> The `SqliteMonitor` class is the default implementation of `MonitorBase` class, which is based on Sqlite3.
+Currently the above code returns a `SqliteMonitor` instance, which is initialized in `agentscope.init`.
+The `SqliteMonitor` class is the default implementation of `MonitorBase` class, which is based on Sqlite3.
+
+If you don't want to use monitor, you can set `use_monitor=False` in `agentscope.init` to disable the monitor. And in this case, the `MonitorFactory.get_monitor` method will return an instance of `DummyMonitor` which has the same interface as the `SqliteMonitor` class, but does nothing inside.
 
 ### Basic Usage
 

diff --git a/docs/sphinx_doc/en/source/tutorial/208-distribute.md b/docs/sphinx_doc/en/source/tutorial/208-distribute.md
@@ -12,78 +12,166 @@ This tutorial will introduce the implementation and usage of AgentScope distribu
 
 ## Usage
 
-In AgentScope, the process that runs the application flow is called the "main process", and all agents will run in separate processes.
-According to the different relationships between the main process and the agent process, AgentScope supports two distributed modes: Master-Slave and Peer-to-Peer mode.
-In the Master-Slave mode, developers can start all agent processes from the main process, while in the Peer-to-Peer mode, the agent process is independent of the main process and developers need to start the agent service on the corresponding machine.
+In AgentScope, the process that runs the application flow is called the **main process**, and each agent can run in a separate process named **agent server process**.
+According to the different relationships between the main process and the agent server process, AgentScope supports two modes for each agent: **Child Process** and **Independent Process** mode.
 
-The above concepts may seem complex, but don't worry, for application developers, they only have minor differences when creating agents. Below we introduce how to create distributed agents.
+- In the Child Process Mode, agent server processes will be automatically started as sub-processes from the main process.
+- While in the Independent Process Mode, the agent server process is independent of the main process and developers need to start the agent server process on the corresponding machine.
 
-### Step 1: Create a Distributed Agent
+The above concepts may seem complex, but don't worry, for application developers, you only need to convert your existing agent to its distributed version.
 
-First, the developer's agent must inherit the `agentscope.agents.AgentBase` class. `AgentBase` provides the `to_dist` method to convert the agent into its distributed version. `to_dist` mainly relies on the following parameters to implement the distributed deployment of the agent:
+### Step 1: Convert your agent to its distributed version
 
-- `host`: the hostname or IP address of the machine where the agent runs, defaults to `localhost`.
-- `port`: the port of this agent's RPC server, defaults to `80`.
-- `launch_server`: whether to launch an RPC server locally, defaults to `True`.
+All agents in AgentScope can automatically convert to its distributed version by calling its {func}`to_dist<agentscope.agents.AgentBase.to_dist>` method.
+But note that your agent must inherit from the {class}`agentscope.agents.AgentBase<agentscope.agents.AgentBase>` class, because the `to_dist` method is provided by the `AgentBase` class.
 
 Suppose there are two agent classes `AgentA` and `AgentB`, both of which inherit from `AgentBase`.
 
-#### Master-Slave Mode
+```python
+a = AgentA(
+    name="A"
+    # ...
+)
+b = AgentB(
+    name="B"
+    # ...
+)
+```
+
+Next we will introduce the conversion details of both modes.
 
-In the Master-Slave mode, since all agent processes depend on the main process, all processes actually run on the same machine.
-We can start all agent processes from the main process, that is, the default parameters `launch_server=True` and `host="localhost"`, and we can omit the `port` parameter. AgentScope will automatically find an available local port for the agent process.
+#### Child Process Mode
+
+To use this mode, you only need to call each agent's `to_dist()` method without any input parameter. AgentScope will automatically start all agent server processes from the main process.
 
 ```python
+# Child Process mode
 a = AgentA(
     name="A"
     # ...
 ).to_dist()
+b = AgentB(
+    name="B"
+    # ...
+).to_dist()
 ```
 
-#### Peer-to-Peer Mode
+#### Independent Process Mode
 
-In the Peer-to-Peer mode, we need to start the service of the corresponding agent on the target machine first. For example, deploy an instance of `AgentA` on the machine with IP `a.b.c.d`, and its corresponding port is 12001. Run the following code on this target machine:
+In the Independent Process Mode, we need to start the agent server process on the target machine first.
+For example, start two agent server processes on the two different machines with IP `ip_a` and `ip_b`(called `Machine1` and `Machine2` accrodingly).
+You can run the following code on `Machine1`:
 
 ```python
-from agentscope.agents import RpcAgentServerLauncher
+# import some packages
 
+agentscope.init(
+    ...
+)
 # Create an agent service process
-server_a = RpcAgentServerLauncher(
-    agent_class=AgentA,
-    agent_kwargs={
-        "name": "A"
-        ...
-    },
-    host="a.b.c.d",
-    port=12001,
+server = RpcAgentServerLauncher(
+    host="ip_a",
+    port=12001,  # choose an available port
 )
 
 # Start the service
-server_a.launch()
-server_a.wait_until_terminate()
+server.launch()
+server.wait_until_terminate()
 ```
 
-Then, we can connect to the agent service in the main process with the following code. At this time, the object `a` created in the main process can be used as a local proxy for the agent, allowing developers to write the application flow in a centralized way in the main process.
+And run the following code on `Machine2`:
+
+```python
+# import some packages
+
+agentscope.init(
+    ...
+)
+# Create an agent service process
+server = RpcAgentServerLauncher(
+    host="ip_b",
+    port=12002, # choose an available port
+)
+
+# Start the service
+server.launch()
+server.wait_until_terminate()
+```
+
+Then, you can connect to the agent servers from the main process with the following code.
 
 ```python
 a = AgentA(
     name="A",
     # ...
 ).to_dist(
-    host="a.b.c.d",
+    host="ip_a",
     port=12001,
-    launch_server=False,
+)
+b = AgentB(
+    name="B",
+    # ...
+).to_dist(
+    host="ip_b",
+    port=12002,
+)
+```
+
+The above code will deploy `AgentA` on the agent server process of `Machine1` and `AgentB` on the agent server process of `Machine2`.
+And developers just need to write the application flow in a centralized way in the main process.
+
+#### Advanced Usage of `to_dist`
+
+All examples described above convert initialized agents into their distributed version through the {func}`to_dist<agentscope.agents.AgentBase.to_dist>` method, which is equivalent to initialize the agent twice, once in the main process and once in the agent server process.
+For agents whose initialization process is time-consuming, the `to_dist` method is inefficient. Therefore, AgentScope also provides a method to convert the Agent instance into its distributed version while initializing it, that is, passing in `to_dist` parameter to the Agent's initialization function.
+
+In Child Process Mode, just pass `to_dist=True` to the Agent's initialization function.
+
+```python
+# Child Process mode
+a = AgentA(
+    name="A",
+    # ...
+    to_dist=True
+)
+b = AgentB(
+    name="B",
+    # ...
+    to_dist=True
+)
+```
+
+In Independent Process Mode, you need to encapsulate the parameters of the `to_dist()` method in  {class}`DistConf<agentscope.agents.DistConf>` instance and pass it into the `to_dist` field, for example:
+
+```python
+a = AgentA(
+    name="A",
+    # ...
+    to_dist=DistConf(
+        host="ip_a",
+        port=12001,
+    ),
+)
+b = AgentB(
+    name="B",
+    # ...
+    to_dist=DistConf(
+        host="ip_b",
+        port=12002,
+    ),
 )
 ```
 
+Compared with the original `to_dist()` function call, this method just initializes the agent once in the agent server process.
+
 ### Step 2: Orchestrate Distributed Application Flow
 
 In AgentScope, the orchestration of distributed application flow is exactly the same as non-distributed programs, and developers can write the entire application flow in a centralized way.
 At the same time, AgentScope allows the use of a mixture of locally and distributed deployed agents, and developers do not need to distinguish which agents are local and which are distributed.
 
 The following is the complete code for two agents to communicate with each other in different modes. It can be seen that AgentScope supports zero-cost migration of distributed application flow from centralized to distributed.
 
-- All agents are centralized:
+- All agents are centralized
 
 ```python
 # Create agent objects
@@ -104,7 +192,9 @@ while x is None or x.content == "exit":
     x = b(x)
 ```
 
-- Agents are deployed in a distributed manner (Master-Slave mode):
+- Agents are deployed in a distributed manner
+  - `AgentA` in Child Process mode
+  - `AgentB` in Independent Process Mode
 
 ```python
 # Create agent objects
@@ -116,7 +206,10 @@ a = AgentA(
 b = AgentB(
     name="B",
     # ...
-).to_dist()
+).to_dist(
+    host="ip_b",
+    port=12002,
+)
 
 # Application flow orchestration
 x = None
@@ -148,9 +241,20 @@ By implementing each Agent as an Actor, an Agent will automatically wait for its
 
 #### PlaceHolder
 
-Meanwhile, to support centralized application orchestration, AgentScope introduces the concept of Placeholder. A Placeholder is a special message that contains the address and port number of the agent that generated the Placeholder, which is used to indicate that the input message of the Agent is not ready yet.
-When the input message of the Agent is ready, the Placeholder will be replaced by the real message, and then the actual `reply` method will be executed.
+Meanwhile, to support centralized application orchestration, AgentScope introduces the concept of {class}`Placeholder<agentscope.message.PlaceholderMessage>`.
+A Placeholder is a special message that contains the address and port number of the agent that generated the placeholder, which is used to indicate that the output message of the Agent is not ready yet.
+When calling the `reply` method of a distributed agent, a placeholder is returned immediately without blocking the main process.
+The interface of placeholder is exactly the same as the message, so that the orchestration flow can be written in a centralized way.
+When getting values from a placeholder, the placeholder will send a request to get the real values from the source agent.
+A placeholder itself is also a message, and it can be sent to other agents, and let other agents to get the real values, which can avoid sending the real values multiple times.
 
 About more detailed technical implementation solutions, please refer to our [paper](https://arxiv.org/abs/2402.14034).
 
+#### Agent Server
+
+In agentscope, the agent server provides a running platform for various types of agents.
+Multiple agents can run in the same agent server and hold independent memory and other local states but they will share the same computation resources.
+As long as the code is not modified, an agent server can provide services for multiple main processes.
+This means that when running mutliple applications, you only need to start the agent server for the first time, and it can be reused subsequently.
+
 [[Back to the top]](#208-distribute-en)
diff --git a/docs/sphinx_doc/zh_CN/source/tutorial/201-agent.md b/docs/sphinx_doc/zh_CN/source/tutorial/201-agent.md
@@ -17,6 +17,8 @@
 
 * `sys_prompt`（系统提示）和`engine`（引擎）：系统提示作为预定义的指令，指导agent在其互动中的行为；而engine用于动态生成合适的提示。关于它们的更多细节，我们会在[提示引擎部分](206-prompt)讨论。
 
+* `to_dist`（分布式）：用于创建 agent 的分布式版本，以支持多 agent 的高效协作。请注意`to_dist`是一个保留字段，将自动添加到`AgentBase`所有子类的初始化函数中。关于 `to_dist` 的更多细节，请见[分布式部分](208-distribute)。
+
 除了这些属性，`AgentBase` 还为agent提供了一些关键方法，如 `observe` 和 `reply`：
 
 * `observe()`：通过这个方法，一个agent可以注意到消息而不立即回复，允许它根据观察到的消息更新它的记忆。

diff --git a/docs/sphinx_doc/zh_CN/source/tutorial/207-monitor.md b/docs/sphinx_doc/zh_CN/source/tutorial/207-monitor.md
@@ -35,8 +35,10 @@
 monitor = MonitorFactory.get_monitor()
 ```
 
-> 目前上述代码返回的是 `SqliteMonitor` 实例，它在 `agentscope.init` 中初始化。
-> `SqliteMonitor`  类是基于Sqlite3的 `MonitorBase` 类的默认实现。
+目前上述代码将会返回一个 `SqliteMonitor` 实例，该实例在 `agentscope.init` 中初始化。
+`SqliteMonitor` 是一个基于 Sqlite3 的 `MonitorBase` 实现，也是当前的默认 Monitor。
+
+如果不需要使用 Monitor 的相关功能，可以通过向 `agentscope.init` 中传入 `use_monitor=False` 来关闭 monitor 组件。在这种情况下，`MonitorFactory.get_monitor` 将返回一个 `DummyMonitor` 实例，该实例对外接口与 `SqliteMonitor` 完全相同，但内部不会执行任何操作。
 
 ### 基本使用