Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Agent servers support running any type of agent #135

Merged
Merged
Show file tree
Hide file tree
Changes from 69 commits
Commits
Show all changes
78 commits
Select commit Hold shift + click to select a range
8d5a475
agent server support multiple session
pan-x-c Mar 21, 2024
5e88e1f
support delete session
pan-x-c Mar 21, 2024
f6ec1a2
support to clone multiple instances with different session_id
pan-x-c Mar 25, 2024
d4b9334
fix distribtued examples
pan-x-c Mar 25, 2024
e299788
reorganize distributed examples
pan-x-c Mar 26, 2024
aad935d
support to init session agent instance with user provided args
pan-x-c Mar 26, 2024
44ceeb8
update default max_pool_size
pan-x-c Mar 27, 2024
2842884
opt placeholder _get method
pan-x-c Mar 27, 2024
131051e
Merge branch 'main' into feature/pxc/session_for_rpc_agent
pan-x-c Mar 27, 2024
2170428
reuse channel stub
pan-x-c Mar 28, 2024
89e9afc
opt placeholder
pan-x-c Mar 28, 2024
285e3ac
update docs for rpc client
pan-x-c Mar 28, 2024
c659748
merge main
pan-x-c Apr 1, 2024
e94d70d
rename session_id to agent_id
pan-x-c Apr 1, 2024
a9d0b11
rename session_id to agent_id
pan-x-c Apr 1, 2024
758a276
Merge branch 'main' into feature/pxc/session_for_rpc_agent
pan-x-c Apr 2, 2024
fff3e23
fix agent_id
pan-x-c Apr 2, 2024
d2199b0
update test for agent_id
pan-x-c Apr 2, 2024
9355e24
opt agent instance init
pan-x-c Apr 2, 2024
cd81bd2
fix format
pan-x-c Apr 2, 2024
d5dc287
change rpc agent server into AgentPlatform and support any type of agent
pan-x-c Apr 2, 2024
550f194
ensure to_dist keep agent_id unchanged
pan-x-c Apr 3, 2024
981439f
fix pre-commit
pan-x-c Apr 3, 2024
2c583a4
Merge branch 'feature/pxc/session_for_rpc_agent' into feature/pxc/rpc…
pan-x-c Apr 3, 2024
56e75b0
keep RpcAgentServerLauncher
pan-x-c Apr 3, 2024
d0d9282
merge main
pan-x-c Apr 8, 2024
9852e78
merge main
pan-x-c Apr 9, 2024
230ea4a
create agent in sub thread
pan-x-c Apr 9, 2024
901d60d
async server
pan-x-c Apr 10, 2024
06e4631
fix precommit
pan-x-c Apr 10, 2024
6831150
fix precommit
pan-x-c Apr 10, 2024
2fc7d50
fix rpc error
pan-x-c Apr 10, 2024
711c4f2
update distribute tutorial
pan-x-c Apr 10, 2024
23dd575
support agent nesting
pan-x-c Apr 12, 2024
ebef290
fix precommit
pan-x-c Apr 12, 2024
9fd7fd7
update test
pan-x-c Apr 12, 2024
c7dd1e3
update to_dist interface
pan-x-c Apr 15, 2024
df525cd
support agent nesting
pan-x-c Apr 15, 2024
d16792b
update tutorial
pan-x-c Apr 15, 2024
e8db3aa
add simulation example
pan-x-c Apr 16, 2024
5f5e75f
add README.md for simulation
pan-x-c Apr 16, 2024
384730f
add simulation output example
pan-x-c Apr 16, 2024
4cabb82
update tutorial
pan-x-c Apr 16, 2024
a90e402
Merge branch 'main' into feature/pxc/rpc_server_support_any_agent
pan-x-c Apr 16, 2024
3835a57
serialize use dill
pan-x-c Apr 17, 2024
22df7de
fix dist docs
pan-x-c Apr 18, 2024
e429ac0
add deprecated warning
pan-x-c Apr 18, 2024
927b5b2
update dist doc
pan-x-c Apr 18, 2024
a9570a2
update distributed simulation readme
pan-x-c Apr 18, 2024
fabd0b6
update to_dist interface
pan-x-c Apr 19, 2024
8cb5f68
update to_dist
pan-x-c Apr 22, 2024
d6d9982
Merge branch 'main' into feature/pxc/rpc_server_support_any_agent
pan-x-c Apr 22, 2024
a9009e9
update agent registry
pan-x-c Apr 22, 2024
59e3c12
update tutorial
pan-x-c Apr 22, 2024
89915c0
update docstring
pan-x-c Apr 22, 2024
67020ba
update docstring
pan-x-c Apr 26, 2024
fd21daa
update dist doc
pan-x-c Apr 26, 2024
7be5bc4
raise error when fail to pass openai safety system (#171)
pan-x-c Apr 23, 2024
857fc6a
Checking reset before user input in studio (#154)
qbc2016 Apr 23, 2024
6de52ac
Add execute_shell_command, list_directory_content, and get_current_di…
garyzhang99 Apr 23, 2024
55bfeac
fix mac ver (#174)
rayrayraykk Apr 24, 2024
6657c3f
Add Compiler & Refactor workflow dag runner code (#164)
rayrayraykk Apr 25, 2024
384c4d2
Make workflow compiler return code string (#178)
rayrayraykk Apr 25, 2024
e3415b9
Add displaying cycle dots before agent reply in AS studio (#179)
qbc2016 Apr 26, 2024
f0c1ef9
[DOC] Generate .rst for each .py file and add __init__ method into do…
pan-x-c Apr 26, 2024
e047a8d
update dist doc
pan-x-c Apr 26, 2024
7dffa5b
fix conflict
pan-x-c Apr 26, 2024
4304fcc
use distconf
pan-x-c Apr 26, 2024
7da9427
update tutorial for dist
pan-x-c Apr 26, 2024
e305970
add notes for to_dist field
pan-x-c Apr 26, 2024
677ba1b
add notes for to_dist field
pan-x-c Apr 26, 2024
3b89074
add a switch to disable monitor
pan-x-c Apr 29, 2024
bdbbed1
update monitor tutorial
pan-x-c Apr 29, 2024
ac4a924
fix to_dist=True
pan-x-c Apr 29, 2024
850cc9b
update custom agents
pan-x-c Apr 29, 2024
22d550e
update vllm setup script
pan-x-c Apr 29, 2024
51b1239
disable monitor in simulation
pan-x-c Apr 29, 2024
dfa6c6f
keep runtime id in simulation
Apr 30, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -124,6 +124,7 @@ the following libraries.
- [Distributed Conversation](./examples/distributed_basic)
- [Distributed Debate](./examples/distributed_debate)
- [Distributed Parallel Search](./examples/distributed_search)
- [Distributed Large Scale Simulation](./examples/distributed_simulation)

More models, services and examples are coming soon!

Expand Down
1 change: 1 addition & 0 deletions README_ZH.md
Original file line number Diff line number Diff line change
Expand Up @@ -113,6 +113,7 @@ AgentScope支持使用以下库快速部署本地模型服务。
- [分布式对话](./examples/distributed_basic)
- [分布式辩论](./examples/distributed_debate)
- [分布式并行搜索](./examples/distributed_search)
- [分布式大规模仿真](./examples/distributed_simulation)

更多模型API、服务和示例即将推出!

Expand Down
5 changes: 5 additions & 0 deletions docs/sphinx_doc/en/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,11 @@

autodoc_member_order = "bysource"

autodoc_default_options = {
"members": True,
"special-members": "__init__",
}

# Add any paths that contain templates here, relative to this directory.
templates_path = ["_templates"]

Expand Down
170 changes: 137 additions & 33 deletions docs/sphinx_doc/en/source/tutorial/208-distribute.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,78 +12,166 @@ This tutorial will introduce the implementation and usage of AgentScope distribu

## Usage

In AgentScope, the process that runs the application flow is called the "main process", and all agents will run in separate processes.
According to the different relationships between the main process and the agent process, AgentScope supports two distributed modes: Master-Slave and Peer-to-Peer mode.
In the Master-Slave mode, developers can start all agent processes from the main process, while in the Peer-to-Peer mode, the agent process is independent of the main process and developers need to start the agent service on the corresponding machine.
In AgentScope, the process that runs the application flow is called the **main process**, and each agent can run in a separate process named **agent server process**.
According to the different relationships between the main process and the agent server process, AgentScope supports two modes for each agent: **Child Process** and **Independent Process** mode.

The above concepts may seem complex, but don't worry, for application developers, they only have minor differences when creating agents. Below we introduce how to create distributed agents.
- In the Child Process Mode, agent server processes will be automatically started as sub-processes from the main process.
- While in the Independent Process Mode, the agent server process is independent of the main process and developers need to start the agent server process on the corresponding machine.

### Step 1: Create a Distributed Agent
The above concepts may seem complex, but don't worry, for application developers, you only need to convert your existing agent to its distributed version.

First, the developer's agent must inherit the `agentscope.agents.AgentBase` class. `AgentBase` provides the `to_dist` method to convert the agent into its distributed version. `to_dist` mainly relies on the following parameters to implement the distributed deployment of the agent:
### Step 1: Convert your agent to its distributed version

- `host`: the hostname or IP address of the machine where the agent runs, defaults to `localhost`.
- `port`: the port of this agent's RPC server, defaults to `80`.
- `launch_server`: whether to launch an RPC server locally, defaults to `True`.
All agents in AgentScope can automatically convert to its distributed version by calling its {func}`to_dist<agentscope.agents.AgentBase.to_dist>` method.
But note that your agent must inherit from the {class}`agentscope.agents.AgentBase<agentscope.agents.AgentBase>` class, because the `to_dist` method is provided by the `AgentBase` class.

Suppose there are two agent classes `AgentA` and `AgentB`, both of which inherit from `AgentBase`.

#### Master-Slave Mode
```python
a = AgentA(
name="A"
# ...
)
b = AgentB(
name="B"
# ...
)
```

Next we will introduce the conversion details of both modes.

In the Master-Slave mode, since all agent processes depend on the main process, all processes actually run on the same machine.
We can start all agent processes from the main process, that is, the default parameters `launch_server=True` and `host="localhost"`, and we can omit the `port` parameter. AgentScope will automatically find an available local port for the agent process.
#### Child Process Mode

To use this mode, you only need to call each agent's `to_dist()` method without any input parameter. AgentScope will automatically start all agent server processes from the main process.

```python
# Child Process mode
a = AgentA(
name="A"
# ...
).to_dist()
b = AgentB(
name="B"
# ...
).to_dist()
```

#### Peer-to-Peer Mode
#### Independent Process Mode

In the Peer-to-Peer mode, we need to start the service of the corresponding agent on the target machine first. For example, deploy an instance of `AgentA` on the machine with IP `a.b.c.d`, and its corresponding port is 12001. Run the following code on this target machine:
In the Independent Process Mode, we need to start the agent server process on the target machine first.
For example, start two agent server processes on the two different machines with IP `ip_a` and `ip_b`(called `Machine1` and `Machine2` accrodingly).
You can run the following code on `Machine1`:

```python
from agentscope.agents import RpcAgentServerLauncher
# import some packages

agentscope.init(
...
)
# Create an agent service process
server_a = RpcAgentServerLauncher(
agent_class=AgentA,
agent_kwargs={
"name": "A"
...
},
host="a.b.c.d",
port=12001,
server = RpcAgentServerLauncher(
DavdGao marked this conversation as resolved.
Show resolved Hide resolved
host="ip_a",
port=12001, # choose an available port
)

# Start the service
server_a.launch()
server_a.wait_until_terminate()
server.launch()
server.wait_until_terminate()
```

Then, we can connect to the agent service in the main process with the following code. At this time, the object `a` created in the main process can be used as a local proxy for the agent, allowing developers to write the application flow in a centralized way in the main process.
And run the following code on `Machine2`:

```python
# import some packages

agentscope.init(
...
)
# Create an agent service process
server = RpcAgentServerLauncher(
host="ip_b",
port=12002, # choose an available port
)

# Start the service
server.launch()
server.wait_until_terminate()
```

Then, you can connect to the agent servers from the main process with the following code.

```python
a = AgentA(
name="A",
# ...
).to_dist(
host="a.b.c.d",
host="ip_a",
port=12001,
launch_server=False,
)
b = AgentB(
name="B",
# ...
).to_dist(
host="ip_b",
port=12002,
)
```

The above code will deploy `AgentA` on the agent server process of `Machine1` and `AgentB` on the agent server process of `Machine2`.
And developers just need to write the application flow in a centralized way in the main process.

#### Advanced Usage of `to_dist`

All examples described above convert initialized agents into their distributed version through the {func}`to_dist<agentscope.agents.AgentBase.to_dist>` method, which is equivalent to initialize the agent twice, once in the main process and once in the agent server process.
For agents whose initialization process is time-consuming, the `to_dist` method is inefficient. Therefore, AgentScope also provides a method to convert the Agent instance into its distributed version while initializing it, that is, passing in `to_dist` parameter to the Agent's initialization function.

In Child Process Mode, just pass `to_dist=True` to the Agent's initialization function.

```python
# Child Process mode
a = AgentA(
name="A",
# ...
to_dist=True
)
b = AgentB(
name="B",
# ...
to_dist=True
)
```

In Independent Process Mode, you need to encapsulate the parameters of the `to_dist()` method in {class}`DistConf<agentscope.agents.DistConf>` instance and pass it into the `to_dist` field, for example:

```python
a = AgentA(
name="A",
# ...
to_dist=DistConf(
host="ip_a",
port=12001,
),
)
b = AgentB(
name="B",
# ...
to_dist=DistConf(
host="ip_b",
port=12002,
),
)
```

Compared with the original `to_dist()` function call, this method just initializes the agent once in the agent server process.

### Step 2: Orchestrate Distributed Application Flow

In AgentScope, the orchestration of distributed application flow is exactly the same as non-distributed programs, and developers can write the entire application flow in a centralized way.
At the same time, AgentScope allows the use of a mixture of locally and distributed deployed agents, and developers do not need to distinguish which agents are local and which are distributed.

The following is the complete code for two agents to communicate with each other in different modes. It can be seen that AgentScope supports zero-cost migration of distributed application flow from centralized to distributed.

- All agents are centralized:
- All agents are centralized

```python
# Create agent objects
Expand All @@ -104,7 +192,9 @@ while x is None or x.content == "exit":
x = b(x)
```

- Agents are deployed in a distributed manner (Master-Slave mode):
- Agents are deployed in a distributed manner
- `AgentA` in Child Process mode
- `AgentB` in Independent Process Mode

```python
# Create agent objects
Expand All @@ -116,7 +206,10 @@ a = AgentA(
b = AgentB(
name="B",
# ...
).to_dist()
).to_dist(
host="ip_b",
port=12002,
)

# Application flow orchestration
x = None
Expand Down Expand Up @@ -148,9 +241,20 @@ By implementing each Agent as an Actor, an Agent will automatically wait for its

#### PlaceHolder

Meanwhile, to support centralized application orchestration, AgentScope introduces the concept of Placeholder. A Placeholder is a special message that contains the address and port number of the agent that generated the Placeholder, which is used to indicate that the input message of the Agent is not ready yet.
When the input message of the Agent is ready, the Placeholder will be replaced by the real message, and then the actual `reply` method will be executed.
Meanwhile, to support centralized application orchestration, AgentScope introduces the concept of {class}`Placeholder<agentscope.message.PlaceholderMessage>`.
A Placeholder is a special message that contains the address and port number of the agent that generated the placeholder, which is used to indicate that the output message of the Agent is not ready yet.
When calling the `reply` method of a distributed agent, a placeholder is returned immediately without blocking the main process.
The interface of placeholder is exactly the same as the message, so that the orchestration flow can be written in a centralized way.
When getting values from a placeholder, the placeholder will send a request to get the real values from the source agent.
A placeholder itself is also a message, and it can be sent to other agents, and let other agents to get the real values, which can avoid sending the real values multiple times.

About more detailed technical implementation solutions, please refer to our [paper](https://arxiv.org/abs/2402.14034).

#### Agent Server

In agentscope, the agent server provides a running platform for various types of agents.
Multiple agents can run in the same agent server and hold independent memory and other local states but they will share the same computation resources.
As long as the code is not modified, an agent server can provide services for multiple main processes.
This means that when running mutliple applications, you only need to start the agent server for the first time, and it can be reused subsequently.

[[Back to the top]](#208-distribute-en)
Loading