Support streaming mode in AgentScope (#347)

--------- Co-authored-by: zhijianma <[email protected]>
modelscope · Jul 19, 2024 · 47f570e · 47f570e
1 parent 4691a3d
commit 47f570e
Show file tree

Hide file tree

Showing 36 changed files with 1,652 additions and 392 deletions.
diff --git a/README.md b/README.md
@@ -40,6 +40,13 @@ Start building LLM-empowered multi-agent applications in an easier way.
 
 ## News
 
+- <img src="https://img.alicdn.com/imgextra/i3/O1CN01SFL0Gu26nrQBFKXFR_!!6000000007707-2-tps-500-500.png" alt="new" width="30" height="30"/>**[2024-07-18]** AgentScope supports streaming mode now! Refer to our [tutorial](https://modelscope.github.io/agentscope/en/tutorial/203-stream.html) and example [conversation in stream mode](https://github.com/modelscope/agentscope/tree/main/examples/conversation_in_stream_mode) for more details.
+
+<h5 align="left">
+<img src="https://github.com/user-attachments/assets/b14d9b2f-ce02-4f40-8c1a-950f4022c0cc" width="30%" alt="agentscope-logo">
+<img src="https://github.com/user-attachments/assets/dfffbd1e-1fe7-49ee-ac11-902415b2b0d6" width="30%" alt="agentscope-logo">
+</h5>
+
 - <img src="https://img.alicdn.com/imgextra/i3/O1CN01SFL0Gu26nrQBFKXFR_!!6000000007707-2-tps-500-500.png" alt="new" width="30" height="30"/>**[2024-07-15]** AgentScope has implemented the Mixture-of-Agents algorithm. Refer to our [MoA example](https://github.com/modelscope/agentscope/blob/main/examples/conversation_mixture_of_agents) for more details.
 
 - <img src="https://img.alicdn.com/imgextra/i3/O1CN01SFL0Gu26nrQBFKXFR_!!6000000007707-2-tps-500-500.png" alt="new" width="30" height="30"/>**[2024-06-14]** A new prompt tuning module is available in AgentScope to help developers generate and optimize the agents' system prompts! Refer to our [tutorial](https://modelscope.github.io/agentscope/en/tutorial/209-prompt_opt.html) for more details!
@@ -145,7 +152,7 @@ the following libraries.
 **Example Applications**
 
 - Model
-  - <img src="https://img.alicdn.com/imgextra/i3/O1CN01SFL0Gu26nrQBFKXFR_!!6000000007707-2-tps-500-500.png" alt="new" width="30" height="30"/>[Using Llama3 in AgentScope](https://github.com/modelscope/agentscope/blob/main/examples/model_llama3)
+  - [Using Llama3 in AgentScope](https://github.com/modelscope/agentscope/blob/main/examples/model_llama3)
 
 - Conversation
   - [Basic Conversation](https://github.com/modelscope/agentscope/blob/main/examples/conversation_basic)
@@ -157,8 +164,9 @@ the following libraries.
   - [Conversation with RAG Agent](https://github.com/modelscope/agentscope/blob/main/examples/conversation_with_RAG_agents)
   - <img src="https://img.alicdn.com/imgextra/i3/O1CN01SFL0Gu26nrQBFKXFR_!!6000000007707-2-tps-500-500.png" alt="new" width="30" height="30"/>[Conversation with gpt-4o](https://github.com/modelscope/agentscope/blob/main/examples/conversation_with_gpt-4o)
   - <img src="https://img.alicdn.com/imgextra/i3/O1CN01SFL0Gu26nrQBFKXFR_!!6000000007707-2-tps-500-500.png" alt="new" width="30" height="30"/>[Conversation with Software Engineering Agent](https://github.com/modelscope/agentscope/blob/main/examples/conversation_with_swe-agent/)
-  - <img src="https://img.alicdn.com/imgextra/i3/O1CN01SFL0Gu26nrQBFKXFR_!!6000000007707-2-tps-500-500.png" alt="new" width="30" height="30"/>[Conversation with Customized Services](https://github.com/modelscope/agentscope/blob/main/examples/conversation_with_customized_services/)
-  - <img src="https://img.alicdn.com/imgextra/i3/O1CN01SFL0Gu26nrQBFKXFR_!!6000000007707-2-tps-500-500.png" alt="new" width="30" height="30"/>[Conversation with Mixture of Agents](https://github.com/modelscope/agentscope/blob/main/examples/conversation_mixture_of_agents/)
+  - <img src="https://img.alicdn.com/imgextra/i3/O1CN01SFL0Gu26nrQBFKXFR_!!6000000007707-2-tps-500-500.png" alt="new" width="30" height="30"/>[Conversation with Customized Tools](https://github.com/modelscope/agentscope/blob/main/examples/conversation_with_customized_services/)
+  - <img src="https://img.alicdn.com/imgextra/i3/O1CN01SFL0Gu26nrQBFKXFR_!!6000000007707-2-tps-500-500.png" alt="new" width="30" height="30"/>[Mixture of Agents Algorithm](https://github.com/modelscope/agentscope/blob/main/examples/conversation_mixture_of_agents/)
+  - <img src="https://img.alicdn.com/imgextra/i3/O1CN01SFL0Gu26nrQBFKXFR_!!6000000007707-2-tps-500-500.png" alt="new" width="30" height="30"/>[Conversation in Stream Mode](https://github.com/modelscope/agentscope/blob/main/examples/conversation_in_stream_mode/)
 
 - Game
   - [Gomoku](https://github.com/modelscope/agentscope/blob/main/examples/game_gomoku)

diff --git a/README_ZH.md b/README_ZH.md
@@ -41,6 +41,13 @@
 
 ## 新闻
 
+- <img src="https://img.alicdn.com/imgextra/i3/O1CN01SFL0Gu26nrQBFKXFR_!!6000000007707-2-tps-500-500.png" alt="new" width="30" height="30"/>**[2024-07-18]** AgentScope 已支持模型流式输出。请参考我们的 [**教程**](https://modelscope.github.io/agentscope/zh_CN/tutorial/203-stream.html) 和 [**流式对话样例**](https://github.com/modelscope/agentscope/tree/main/examples/conversation_in_stream_mode)！
+
+<h5 align="left">
+<img src="https://github.com/user-attachments/assets/b14d9b2f-ce02-4f40-8c1a-950f4022c0cc" width="30%" alt="agentscope-logo">
+<img src="https://github.com/user-attachments/assets/dfffbd1e-1fe7-49ee-ac11-902415b2b0d6" width="30%" alt="agentscope-logo">
+</h5>
+
 - <img src="https://img.alicdn.com/imgextra/i3/O1CN01SFL0Gu26nrQBFKXFR_!!6000000007707-2-tps-500-500.png" alt="new" width="30" height="30"/>**[2024-07-15]** AgentScope 中添加了 Mixture of Agents 算法。使用样例请参考 [MoA 示例](https://github.com/modelscope/agentscope/blob/main/examples/conversation_mixture_of_agents)。
 
 - <img src="https://img.alicdn.com/imgextra/i3/O1CN01SFL0Gu26nrQBFKXFR_!!6000000007707-2-tps-500-500.png" alt="new" width="30" height="30"/>**[2024-06-14]** 新的提示调优（Prompt tuning）模块已经上线 AgentScope，用以帮助开发者生成和优化智能体的 system prompt。更多的细节和使用样例请参考 AgentScope [教程](https://modelscope.github.io/agentscope/en/tutorial/209-prompt_opt.html)！
@@ -135,7 +142,7 @@ AgentScope支持使用以下库快速部署本地模型服务。
 **样例应用**
 
 - 模型
-  - <img src="https://img.alicdn.com/imgextra/i3/O1CN01SFL0Gu26nrQBFKXFR_!!6000000007707-2-tps-500-500.png" alt="new" width="30" height="30"/>[在AgentScope中使用Llama3](./examples/model_llama3)
+  - [在AgentScope中使用Llama3](./examples/model_llama3)
 
 - 对话
   - [基础对话](./examples/conversation_basic)
@@ -147,8 +154,9 @@ AgentScope支持使用以下库快速部署本地模型服务。
   - [与RAG智能体对话](./examples/conversation_with_RAG_agents)
   - <img src="https://img.alicdn.com/imgextra/i3/O1CN01SFL0Gu26nrQBFKXFR_!!6000000007707-2-tps-500-500.png" alt="new" width="30" height="30"/>[与gpt-4o模型对话](./examples/conversation_with_gpt-4o)
   - <img src="https://img.alicdn.com/imgextra/i3/O1CN01SFL0Gu26nrQBFKXFR_!!6000000007707-2-tps-500-500.png" alt="new" width="30" height="30"/>[与SoftWare Engineering智能体对话](./examples/conversation_with_swe-agent/)
-  - <img src="https://img.alicdn.com/imgextra/i3/O1CN01SFL0Gu26nrQBFKXFR_!!6000000007707-2-tps-500-500.png" alt="new" width="30" height="30"/>[与自定义服务对话](./examples/conversation_with_customized_services/)
-  - <img src="https://img.alicdn.com/imgextra/i3/O1CN01SFL0Gu26nrQBFKXFR_!!6000000007707-2-tps-500-500.png" alt="new" width="30" height="30"/>[与混合智能体(Mixture of Agents)对话](https://github.com/modelscope/agentscope/blob/main/examples/conversation_mixture_of_agents/)
+  - <img src="https://img.alicdn.com/imgextra/i3/O1CN01SFL0Gu26nrQBFKXFR_!!6000000007707-2-tps-500-500.png" alt="new" width="30" height="30"/>[自定义工具函数](./examples/conversation_with_customized_services/)
+  - <img src="https://img.alicdn.com/imgextra/i3/O1CN01SFL0Gu26nrQBFKXFR_!!6000000007707-2-tps-500-500.png" alt="new" width="30" height="30"/>[Mixture of Agents算法](https://github.com/modelscope/agentscope/blob/main/examples/conversation_mixture_of_agents/)
+  - <img src="https://img.alicdn.com/imgextra/i3/O1CN01SFL0Gu26nrQBFKXFR_!!6000000007707-2-tps-500-500.png" alt="new" width="30" height="30"/>[流式对话](https://github.com/modelscope/agentscope/blob/main/examples/conversation_in_stream_mode/)
 
 - 游戏
   - [五子棋](./examples/game_gomoku)

diff --git a/docs/sphinx_doc/en/source/index.rst b/docs/sphinx_doc/en/source/index.rst
@@ -23,6 +23,7 @@ AgentScope Documentation
    tutorial/103-example.md
 
    tutorial/203-model.md
+   tutorial/203-stream.md
    tutorial/206-prompt.md
    tutorial/201-agent.md
    tutorial/205-memory.md

diff --git a/docs/sphinx_doc/en/source/tutorial/203-stream.md b/docs/sphinx_doc/en/source/tutorial/203-stream.md
@@ -0,0 +1,123 @@
+(203-stream-en)=
+
+# Streaming
+
+AgentScope supports streaming mode for the following LLM APIs in both **terminal** and **AgentScope Studio**.
+
+| API                | Model Wrapper                                                                                                                   | `model_type` field in model configuration |
+|--------------------|---------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------|
+| OpenAI Chat API    |  [`OpenAIChatWrapper`](https://github.com/modelscope/agentscope/blob/main/src/agentscope/models/openai_model.py)                 | `"openai_chat"`                           |
+| DashScope Chat API |  [`DashScopeChatWrapper`](https://github.com/modelscope/agentscope/blob/main/src/agentscope/models/dashscope_model.py)           | `"dashscope_chat"`                        |
+| Gemini Chat API    |  [`GeminiChatWrapper`](https://github.com/modelscope/agentscope/blob/main/src/agentscope/models/gemini_model.py)                 | `"gemini_chat"`                           |
+| ZhipuAI Chat API   |  [`ZhipuAIChatWrapper`](https://github.com/modelscope/agentscope/blob/main/src/agentscope/models/zhipu_model.py)                 | `"zhipuai_chat"`                          |
+| ollama Chat API    |  [`OllamaChatWrapper`](https://github.com/modelscope/agentscope/blob/main/src/agentscope/models/ollama_model.py)                 | `"ollama_chat"`                           |
+| LiteLLM Chat API   |  [`LiteLLMChatWrapper`](https://github.com/modelscope/agentscope/blob/main/src/agentscope/models/litellm_model.py)             | `"litellm_chat"`                          |
+
+
+## Setup Streaming Mode
+
+AgentScope allows users to set up streaming mode in both model configuration and model calling.
+
+### In Model Configuration
+
+To use streaming mode, set the stream field to `True` in the model configuration.
+
+```python
+model_config = {
+    "config_name": "xxx",
+    "model_type": "xxx",
+    "stream": True,
+    # ...
+}
+```
+
+### In Model Calling
+
+Within an agent, you can call the model with the `stream` parameter set to `True`.
+Note the `stream` parameter in the model calling will override the `stream` field in the model configuration.
+
+```python
+class MyAgent(AgentBase):
+    # ...
+    def reply(self, x: Optional[Msg, Sequence[Msg]] = None) -> Msg:
+        # ...
+        response = self.model(
+            prompt,
+            stream=True,
+        )
+        # ...
+```
+
+## Printing in Streaming Mode
+
+In streaming mode, the `stream` field of a model response will be a generator, and the `text` field will be `None`.
+For compatibility with the non-streaming mode, once the `text` field is accessed, the generator in `stream` field will be iterated to generate the full text and store it in the `text` field.
+So that even in streaming mode, users can handle the response text in `text` field as usual.
+
+However, if you want to print in streaming mode, just put the generator in `self.speak` to print the streaming text in the terminal and AgentScope Studio.
+
+After printing the streaming response, the full text of the response will be available in the `response.text` field.
+
+```python
+    def reply(self, x: Optional[Msg, Sequence[Msg]] = None) -> Msg:
+        # ...
+        # Use stream=True if you want to set up streaming mode in model calling
+        response = self.model(prompt)
+
+        # For now, the response.text is None
+
+        # Print the response in streaming mode in terminal and AgentScope Studio (if available)
+        self.speak(response.stream)
+
+        # After printing, the response.text will be the full text of the response, and you can handle it as usual
+        msg = Msg(self.name, content=response.text, role="assistant")
+
+        self.memory.add(msg)
+
+        return msg
+
+```
+
+## Advanced Usage
+
+For users who want to handle the streaming response by themselves, they can iterate the generator and handle the response text in their own way.
+
+An example of how to handle the streaming response is in the `speak` function of `AgentBase` as follows.
+The `log_stream_msg` function will print the streaming response in the terminal and AgentScope Studio (if registered).
+
+```python
+        # ...
+        elif isinstance(content, GeneratorType):
+            # The streaming message must share the same id for displaying in
+            # the agentscope studio.
+            msg = Msg(name=self.name, content="", role="assistant")
+            for last, text_chunk in content:
+                msg.content = text_chunk
+                log_stream_msg(msg, last=last)
+        else:
+        # ...
+```
+
+However, they should remember the following points:
+
+1. When iterating the generator, the `response.text` field will include the text that has been iterated automatically.
+2. The generator in the `stream` field will generate a tuple of boolean and text. The boolean indicates whether the text is the end of the response, and the text is the response text until now.
+3. To print streaming text in AgentScope Studio, the message id should be the same for one response in the `log_stream_msg` function.
+
+
+```python
+    def reply(self, x: Optional[Msg, Sequence[Msg]] = None) -> Msg:
+        # ...
+        response = self.model(prompt)
+
+        # For now, the response.text is None
+
+        # Iterate the generator and handle the response text by yourself
+        for last_chunk, text in response.stream:
+            # Handle the text in your way
+            # ...
+
+
+```
+
+[[Return to the top]](#203-stream-en)
diff --git a/docs/sphinx_doc/en/source/tutorial/main.md b/docs/sphinx_doc/en/source/tutorial/main.md
@@ -14,6 +14,7 @@ AgentScope is an innovative multi-agent platform designed to empower developers
 - [Installation](102-installation.md)
 - [Quick Start](103-example.md)
 - [Model](203-model.md)
+- [Streaming](203-model.md)
 - [Prompt Engineering](206-prompt.md)
 - [Agent](201-agent.md)
 - [Memory](205-memory.md)

diff --git a/docs/sphinx_doc/zh_CN/source/index.rst b/docs/sphinx_doc/zh_CN/source/index.rst
@@ -23,6 +23,7 @@ AgentScope 文档
    tutorial/103-example.md
 
    tutorial/203-model.md
+   tutorial/203-stream.md
    tutorial/206-prompt.md
    tutorial/201-agent.md
    tutorial/205-memory.md

diff --git a/docs/sphinx_doc/zh_CN/source/tutorial/203-stream.md b/docs/sphinx_doc/zh_CN/source/tutorial/203-stream.md
@@ -0,0 +1,121 @@
+(203-stream-zh)=
+
+# 流式输出
+
+AgentScope 支持在**终端**和 **AgentScope Studio** 中使用以下大模型 API 的流式输出模式。
+
+| API                | Model Wrapper                                                                                                                   | 对应的 `model_type` 域 |
+|--------------------|---------------------------------------------------------------------------------------------------------------------------------|--------------------|
+| OpenAI Chat API    |  [`OpenAIChatWrapper`](https://github.com/modelscope/agentscope/blob/main/src/agentscope/models/openai_model.py)                 | `"openai_chat"`    |
+| DashScope Chat API |  [`DashScopeChatWrapper`](https://github.com/modelscope/agentscope/blob/main/src/agentscope/models/dashscope_model.py)           | `"dashscope_chat"` |
+| Gemini Chat API    |  [`GeminiChatWrapper`](https://github.com/modelscope/agentscope/blob/main/src/agentscope/models/gemini_model.py)                 | `"gemini_chat"`    |
+| ZhipuAI Chat API   |  [`ZhipuAIChatWrapper`](https://github.com/modelscope/agentscope/blob/main/src/agentscope/models/zhipu_model.py)                 | `"zhipuai_chat"`   |
+| ollama Chat API    |  [`OllamaChatWrapper`](https://github.com/modelscope/agentscope/blob/main/src/agentscope/models/ollama_model.py)                 | `"ollama_chat"`    |
+| LiteLLM Chat API   |  [`LiteLLMChatWrapper`](https://github.com/modelscope/agentscope/blob/main/src/agentscope/models/litellm_model.py)             | `"litellm_chat"`   |
+
+
+## 设置流式输出
+
+AgentScope 允许用户在模型配置和模型调用中设置流式输出模式。
+
+### 模型配置
+
+在模型配置中将 `stream` 字段设置为 `True` 以使用流式输出模式。
+
+```python
+model_config = {
+    "config_name": "xxx",
+    "model_type": "xxx",
+    "stream": True,
+    # ...
+}
+```
+
+### 模型调用
+
+在智能体中，可以在调用模型时将 `stream` 参数设置为 `True`。注意，模型调用中的 `stream` 参数将覆盖模型配置中的 `stream` 字段。
+
+```python
+class MyAgent(AgentBase):
+    # ...
+    def reply(self, x: Optional[Msg, Sequence[Msg]] = None) -> Msg:
+        # ...
+        response = self.model(
+            prompt,
+            stream=True,
+        )
+        # ...
+```
+
+## 流式打印
+
+在流式输出模式下，模型响应的 `stream` 字段将是一个生成器，而 `text` 字段将是 `None`。
+为了与非流式兼容，用户一旦在迭代生成器前访问 `text` 字段，`stream` 中的生成器将被迭代以生成完整的文本，并将其存储在 `text` 字段中。
+因此，即使在流式输出模式下，用户也可以像往常一样在 `text` 字段中处理响应文本而无需任何改变。
+
+但是，如果用户需要流式的输出，只需要将生成器放在 `self.speak` 函数中，以在终端和 AgentScope Studio 中流式打印文本。
+
+```python
+    def reply(self, x: Optional[Msg, Sequence[Msg]] = None) -> Msg:
+        # ...
+        # 如果想在调用时使用流式打印，在这里调用时使用 stream=True
+        response = self.model(prompt)
+
+        # 程序运行到这里时，response.text 为 None
+
+        # 在 terminal 和 AgentScope Studio 中流式打印文本
+        self.speak(response.stream)
+
+        # 生成器被迭代时，产生的文本将自动被存储在 response.text 中，因此用户可以直接使用 response.text 处理响应文本
+        msg = Msg(self.name, content=response.text, role="assistant")
+
+        self.memory.add(msg)
+
+        return msg
+
+```
+
+## 进阶用法
+
+如果用户想要自己处理流式输出，可以通过迭代生成器来实时获得流式的响应文本。
+
+An example of how to handle the streaming response is in the `speak` function of `AgentBase` as follows.
+关于如何处理流式输出，可以参考 `AgentBase` 中的 `speak` 函数。
+The `log_stream_msg` function will print the streaming response in the terminal and AgentScope Studio (if registered).
+其中 `log_stream_msg` 函数将在终端和 AgentScope Studio 中实时地流式打印文本。
+
+```python
+        # ...
+        elif isinstance(content, GeneratorType):
+            # 流式消息必须共享相同的 id 才能在 AgentScope Studio 中显示，因此这里通过同一条消息切换 content 字段来实现
+            msg = Msg(name=self.name, content="", role="assistant")
+            for last, text_chunk in content:
+                msg.content = text_chunk
+                log_stream_msg(msg, last=last)
+        else:
+        # ...
+```
+
+在处理生成器的时候，用户应该记住以下几点：
+
+1. 在迭代生成器时，`response.text` 字段将自动包含已迭代的文本。
+2. `stream` 字段中的生成器将生成一个布尔值和字符串的二元组。布尔值表示当前是否是最后一段文本，而字符串则是到目前为止的响应文本。
+3. AgentScope Studio 依据 `log_stream_msg` 函数中输入的 `Msg` 对象的 id 判断文本是否属于同一条流式响应，若 id 不同，则会被视为不同的响应。
+
+
+```python
+    def reply(self, x: Optional[Msg, Sequence[Msg]] = None) -> Msg:
+        # ...
+        response = self.model(prompt)
+
+        # 程序运行到这里时，response.text 为 None
+
+        # 迭代生成器，自己处理响应文本
+        for last_chunk, text in response.stream:
+            # 按照自己的需求处理响应文本
+            # ...
+
+
+```
+
+[[Return to the top]](#203-stream-zh)