diff --git a/README.md b/README.md
index 55360686d..993ffdcd6 100644
--- a/README.md
+++ b/README.md
@@ -31,6 +31,12 @@ Welcome to join our community on
## News
+- **[2024-05-15]** A new **Parser Module** for **formatted response** is added in AgentScope! Refer to our [tutorial](https://modelscope.github.io/agentscope/en/tutorial/203-parser.html) for more details. The [`DictDialogAgent`](https://github.com/modelscope/agentscope/blob/main/src/agentscope/agents/dict_dialog_agent.py) and [werewolf game](https://github.com/modelscope/agentscope/tree/main/examples/game_werewolf) example are updated simultaneously.
+
+- **[2024-05-14]** Dear AgentScope users, we are conducting a survey on **AgentScope Workstation & Copilot** user experience. We currently need your valuable feedback to help us improve the experience of AgentScope's Drag & Drop multi-agent application development and Copilot. Your feedback is valuable and the survey will take about 3~5 minutes. Please click [URL](https://survey.aliyun.com/apps/zhiliao/vgpTppn22) to participate in questionnaire surveys. Thank you very much for your support and contribution!
+
+- **[2024-05-14]** AgentScope supports **gpt-4o** as well as other OpenAI vision models now! Try gpt-4o with its [model configuration](./examples/model_configs_template/openai_chat_template.json) and new example [Conversation with gpt-4o](./examples/conversation_with_gpt-4o)!
+
- **[2024-04-30]** We release **AgentScope** v0.0.4 now!
- **[2024-04-27]** [AgentScope Workstation](https://agentscope.aliyun.com/) is now online! You are welcome to try building your multi-agent application simply with our *drag-and-drop platform* and ask our *copilot* questions about AgentScope!
@@ -75,24 +81,24 @@ applications in a centralized programming manner for streamlined development.
AgentScope provides a list of `ModelWrapper` to support both local model
services and third-party model APIs.
-| API | Task | Model Wrapper | Configuration | Some Supported Models |
-|------------------------|-----------------|---------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------|-----------------------------------------------|
-| OpenAI API | Chat | [`OpenAIChatWrapper`](https://github.com/modelscope/agentscope/blob/main/src/agentscope/models/openai_model.py) |[guidance](https://modelscope.github.io/agentscope/en/tutorial/203-model.html#openai-api)
[template](https://github.com/modelscope/agentscope/blob/main/examples/model_configs_template/openai_chat_template.json) | gpt-4, gpt-3.5-turbo, ... |
-| | Embedding | [`OpenAIEmbeddingWrapper`](https://github.com/modelscope/agentscope/blob/main/src/agentscope/models/openai_model.py) | [guidance](https://modelscope.github.io/agentscope/en/tutorial/203-model.html#openai-api)
[template](https://github.com/modelscope/agentscope/blob/main/examples/model_configs_template/openai_embedding_template.json) | text-embedding-ada-002, ... |
-| | DALL·E | [`OpenAIDALLEWrapper`](https://github.com/modelscope/agentscope/blob/main/src/agentscope/models/openai_model.py) | [guidance](https://modelscope.github.io/agentscope/en/tutorial/203-model.html#openai-api)
[template](https://github.com/modelscope/agentscope/blob/main/examples/model_configs_template/openai_dall_e_template.json) | dall-e-2, dall-e-3 |
-| DashScope API | Chat | [`DashScopeChatWrapper`](https://github.com/modelscope/agentscope/blob/main/src/agentscope/models/dashscope_model.py) | [guidance](https://modelscope.github.io/agentscope/en/tutorial/203-model.html#dashscope-api)
[template](https://github.com/modelscope/agentscope/blob/main/examples/model_configs_template/dashscope_chat_template.json) | qwen-plus, qwen-max, ... |
-| | Image Synthesis | [`DashScopeImageSynthesisWrapper`](https://github.com/modelscope/agentscope/blob/main/src/agentscope/models/dashscope_model.py) | [guidance](https://modelscope.github.io/agentscope/en/tutorial/203-model.html#dashscope-api)
[template](https://github.com/modelscope/agentscope/blob/main/examples/model_configs_template/dashscope_image_synthesis_template.json) | wanx-v1 |
-| | Text Embedding | [`DashScopeTextEmbeddingWrapper`](https://github.com/modelscope/agentscope/blob/main/src/agentscope/models/dashscope_model.py) | [guidance](https://modelscope.github.io/agentscope/en/tutorial/203-model.html#dashscope-api)
[template](https://github.com/modelscope/agentscope/blob/main/examples/model_configs_template/dashscope_text_embedding_template.json) | text-embedding-v1, text-embedding-v2, ... |
-| | Multimodal | [`DashScopeMultiModalWrapper`](https://github.com/modelscope/agentscope/blob/main/src/agentscope/models/dashscope_model.py) | [guidance](https://modelscope.github.io/agentscope/en/tutorial/203-model.html#dashscope-api)
[template](https://github.com/modelscope/agentscope/blob/main/examples/model_configs_template/dashscope_multimodal_template.json) | qwen-vl-max, qwen-vl-chat-v1, qwen-audio-chat |
-| Gemini API | Chat | [`GeminiChatWrapper`](https://github.com/modelscope/agentscope/blob/main/src/agentscope/models/gemini_model.py) | [guidance](https://modelscope.github.io/agentscope/en/tutorial/203-model.html#gemini-api)
[template](https://github.com/modelscope/agentscope/blob/main/examples/model_configs_template/gemini_chat_template.json) | gemini-pro, ... |
-| | Embedding | [`GeminiEmbeddingWrapper`](https://github.com/modelscope/agentscope/blob/main/src/agentscope/models/gemini_model.py) | [guidance](https://modelscope.github.io/agentscope/en/tutorial/203-model.html#gemini-api)
[template](https://github.com/modelscope/agentscope/blob/main/examples/model_configs_template/gemini_embedding_template.json) | models/embedding-001, ... |
-| ZhipuAI API | Chat | [`ZhipuAIChatWrapper`](https://github.com/modelscope/agentscope/blob/main/src/agentscope/models/zhipu_model.py) | [guidance](https://modelscope.github.io/agentscope/en/tutorial/203-model.html#zhipu-api)
[template](https://github.com/modelscope/agentscope/blob/main/examples/model_configs_template/zhipu_chat_template.json) | glm-4, ... |
-| | Embedding | [`ZhipuAIEmbeddingWrapper`](https://github.com/modelscope/agentscope/blob/main/src/agentscope/models/zhipu_model.py) | [guidance](https://modelscope.github.io/agentscope/en/tutorial/203-model.html#zhipu-api)
[template](https://github.com/modelscope/agentscope/blob/main/examples/model_configs_template/zhipu_embedding_template.json) | embedding-2, ... |
-| ollama | Chat | [`OllamaChatWrapper`](https://github.com/modelscope/agentscope/blob/main/src/agentscope/models/ollama_model.py) | [guidance](https://modelscope.github.io/agentscope/en/tutorial/203-model.html#ollama-api)
[template](https://github.com/modelscope/agentscope/blob/main/examples/model_configs_template/ollama_chat_template.json) | llama3, llama2, Mistral, ... |
-| | Embedding | [`OllamaEmbeddingWrapper`](https://github.com/modelscope/agentscope/blob/main/src/agentscope/models/ollama_model.py) | [guidance](https://modelscope.github.io/agentscope/en/tutorial/203-model.html#ollama-api)
[template](https://github.com/modelscope/agentscope/blob/main/examples/model_configs_template/ollama_embedding_template.json) | llama2, Mistral, ... |
-| | Generation | [`OllamaGenerationWrapper`](https://github.com/modelscope/agentscope/blob/main/src/agentscope/models/ollama_model.py) | [guidance](https://modelscope.github.io/agentscope/en/tutorial/203-model.html#ollama-api)
[template](https://github.com/modelscope/agentscope/blob/main/examples/model_configs_template/ollama_generate_template.json) | llama2, Mistral, ... |
-| LiteLLM API | Chat | [`LiteLLMChatWrapper`](https://github.com/modelscope/agentscope/blob/main/src/agentscope/models/litellm_model.py) | [guidance](https://modelscope.github.io/agentscope/en/tutorial/203-model.html#litellm-api)
[template](https://github.com/modelscope/agentscope/blob/main/examples/model_configs_template/litellm_chat_template.json) | [models supported by litellm](https://docs.litellm.ai/docs/)... |
-| Post Request based API | - | [`PostAPIModelWrapper`](https://github.com/modelscope/agentscope/blob/main/src/agentscope/models/post_model.py) | [guidance](https://modelscope.github.io/agentscope/en/tutorial/203-model.html#post-request-api)
[template](https://github.com/modelscope/agentscope/blob/main/examples/model_configs_template/postapi_model_config_template.json) | - |
+| API | Task | Model Wrapper | Configuration | Some Supported Models |
+|------------------------|-----------------|---------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------|-----------------------------------------------------------------|
+| OpenAI API | Chat | [`OpenAIChatWrapper`](https://github.com/modelscope/agentscope/blob/main/src/agentscope/models/openai_model.py) |[guidance](https://modelscope.github.io/agentscope/en/tutorial/203-model.html#openai-api)
[template](https://github.com/modelscope/agentscope/blob/main/examples/model_configs_template/openai_chat_template.json) | gpt-4o, gpt-4, gpt-3.5-turbo, ... |
+| | Embedding | [`OpenAIEmbeddingWrapper`](https://github.com/modelscope/agentscope/blob/main/src/agentscope/models/openai_model.py) | [guidance](https://modelscope.github.io/agentscope/en/tutorial/203-model.html#openai-api)
[template](https://github.com/modelscope/agentscope/blob/main/examples/model_configs_template/openai_embedding_template.json) | text-embedding-ada-002, ... |
+| | DALL·E | [`OpenAIDALLEWrapper`](https://github.com/modelscope/agentscope/blob/main/src/agentscope/models/openai_model.py) | [guidance](https://modelscope.github.io/agentscope/en/tutorial/203-model.html#openai-api)
[template](https://github.com/modelscope/agentscope/blob/main/examples/model_configs_template/openai_dall_e_template.json) | dall-e-2, dall-e-3 |
+| DashScope API | Chat | [`DashScopeChatWrapper`](https://github.com/modelscope/agentscope/blob/main/src/agentscope/models/dashscope_model.py) | [guidance](https://modelscope.github.io/agentscope/en/tutorial/203-model.html#dashscope-api)
[template](https://github.com/modelscope/agentscope/blob/main/examples/model_configs_template/dashscope_chat_template.json) | qwen-plus, qwen-max, ... |
+| | Image Synthesis | [`DashScopeImageSynthesisWrapper`](https://github.com/modelscope/agentscope/blob/main/src/agentscope/models/dashscope_model.py) | [guidance](https://modelscope.github.io/agentscope/en/tutorial/203-model.html#dashscope-api)
[template](https://github.com/modelscope/agentscope/blob/main/examples/model_configs_template/dashscope_image_synthesis_template.json) | wanx-v1 |
+| | Text Embedding | [`DashScopeTextEmbeddingWrapper`](https://github.com/modelscope/agentscope/blob/main/src/agentscope/models/dashscope_model.py) | [guidance](https://modelscope.github.io/agentscope/en/tutorial/203-model.html#dashscope-api)
[template](https://github.com/modelscope/agentscope/blob/main/examples/model_configs_template/dashscope_text_embedding_template.json) | text-embedding-v1, text-embedding-v2, ... |
+| | Multimodal | [`DashScopeMultiModalWrapper`](https://github.com/modelscope/agentscope/blob/main/src/agentscope/models/dashscope_model.py) | [guidance](https://modelscope.github.io/agentscope/en/tutorial/203-model.html#dashscope-api)
[template](https://github.com/modelscope/agentscope/blob/main/examples/model_configs_template/dashscope_multimodal_template.json) | qwen-vl-max, qwen-vl-chat-v1, qwen-audio-chat |
+| Gemini API | Chat | [`GeminiChatWrapper`](https://github.com/modelscope/agentscope/blob/main/src/agentscope/models/gemini_model.py) | [guidance](https://modelscope.github.io/agentscope/en/tutorial/203-model.html#gemini-api)
[template](https://github.com/modelscope/agentscope/blob/main/examples/model_configs_template/gemini_chat_template.json) | gemini-pro, ... |
+| | Embedding | [`GeminiEmbeddingWrapper`](https://github.com/modelscope/agentscope/blob/main/src/agentscope/models/gemini_model.py) | [guidance](https://modelscope.github.io/agentscope/en/tutorial/203-model.html#gemini-api)
[template](https://github.com/modelscope/agentscope/blob/main/examples/model_configs_template/gemini_embedding_template.json) | models/embedding-001, ... |
+| ZhipuAI API | Chat | [`ZhipuAIChatWrapper`](https://github.com/modelscope/agentscope/blob/main/src/agentscope/models/zhipu_model.py) | [guidance](https://modelscope.github.io/agentscope/en/tutorial/203-model.html#zhipu-api)
[template](https://github.com/modelscope/agentscope/blob/main/examples/model_configs_template/zhipu_chat_template.json) | glm-4, ... |
+| | Embedding | [`ZhipuAIEmbeddingWrapper`](https://github.com/modelscope/agentscope/blob/main/src/agentscope/models/zhipu_model.py) | [guidance](https://modelscope.github.io/agentscope/en/tutorial/203-model.html#zhipu-api)
[template](https://github.com/modelscope/agentscope/blob/main/examples/model_configs_template/zhipu_embedding_template.json) | embedding-2, ... |
+| ollama | Chat | [`OllamaChatWrapper`](https://github.com/modelscope/agentscope/blob/main/src/agentscope/models/ollama_model.py) | [guidance](https://modelscope.github.io/agentscope/en/tutorial/203-model.html#ollama-api)
[template](https://github.com/modelscope/agentscope/blob/main/examples/model_configs_template/ollama_chat_template.json) | llama3, llama2, Mistral, ... |
+| | Embedding | [`OllamaEmbeddingWrapper`](https://github.com/modelscope/agentscope/blob/main/src/agentscope/models/ollama_model.py) | [guidance](https://modelscope.github.io/agentscope/en/tutorial/203-model.html#ollama-api)
[template](https://github.com/modelscope/agentscope/blob/main/examples/model_configs_template/ollama_embedding_template.json) | llama2, Mistral, ... |
+| | Generation | [`OllamaGenerationWrapper`](https://github.com/modelscope/agentscope/blob/main/src/agentscope/models/ollama_model.py) | [guidance](https://modelscope.github.io/agentscope/en/tutorial/203-model.html#ollama-api)
[template](https://github.com/modelscope/agentscope/blob/main/examples/model_configs_template/ollama_generate_template.json) | llama2, Mistral, ... |
+| LiteLLM API | Chat | [`LiteLLMChatWrapper`](https://github.com/modelscope/agentscope/blob/main/src/agentscope/models/litellm_model.py) | [guidance](https://modelscope.github.io/agentscope/en/tutorial/203-model.html#litellm-api)
[template](https://github.com/modelscope/agentscope/blob/main/examples/model_configs_template/litellm_chat_template.json) | [models supported by litellm](https://docs.litellm.ai/docs/)... |
+| Post Request based API | - | [`PostAPIModelWrapper`](https://github.com/modelscope/agentscope/blob/main/src/agentscope/models/post_model.py) | [guidance](https://modelscope.github.io/agentscope/en/tutorial/203-model.html#post-request-api)
[template](https://github.com/modelscope/agentscope/blob/main/examples/model_configs_template/postapi_model_config_template.json) | - |
**Supported Local Model Deployment**
@@ -127,8 +133,11 @@ the following libraries.
- [Conversation with ReAct Agent](./examples/conversation_with_react_agent)
- [Conversation in Natural Language to Query SQL](./examples/conversation_nl2sql/)
- [Conversation with RAG Agent](./examples/conversation_with_RAG_agents)
+ - [Conversation with gpt-4o](./examples/conversation_with_gpt-4o)
+ - [Conversation with Software Engineering Agent](./examples/swe_agent/)
- [Conversation with Customized Services](./examples/conversation_with_customized_services/)
+
- Game
- [Gomoku](./examples/game_gomoku)
- [Werewolf](./examples/game_werewolf)
diff --git a/README_ZH.md b/README_ZH.md
index c1a3c97c3..2a3c9ced0 100644
--- a/README_ZH.md
+++ b/README_ZH.md
@@ -28,6 +28,12 @@
## 新闻
+- **[2024-05-15]** 用于解析模型格式化输出的**解析器**模块已经上线 AgentScope!更轻松的构建多智能体应用,使用方法请参考[教程](https://modelscope.github.io/agentscope/en/tutorial/203-parser.html)。与此同时,[`DictDialogAgent`](https://github.com/modelscope/agentscope/blob/main/src/agentscope/agents/dict_dialog_agent.py) 类和 [狼人杀游戏](https://github.com/modelscope/agentscope/tree/main/examples/game_werewolf) 样例也已经同步更新!
+
+- **[2024-05-14]** 目前 AgentScope 正在进行 AgentScope Workstation & Copilot 用户体验反馈活动,需要您宝贵的意见来帮助我们改善 AgentScope 的拖拽式多智能体应用开发与 Copilot 体验。您的每一个反馈都十分宝贵,请点击 [链接](https://survey.aliyun.com/apps/zhiliao/vgpTppn22) 参与问卷,感谢您的支持!
+
+- **[2024-05-14]** AgentScope 现已支持 **gpt-4o** 等 OpenAI Vision 模型! 模型配置请见[链接](./examples/model_configs_template/openai_chat_template.json)。同时,新的样例“[与gpt-4o模型对话](./examples/conversation_with_gpt-4o)”已上线!
+
- **[2024-04-30]** 我们现在发布了**AgentScope** v0.0.4版本!
- **[2024-04-27]** [AgentScope Workstation](https://agentscope.aliyun.com/)上线了! 欢迎使用 Workstation 体验如何在*拖拉拽编程平台* 零代码搭建多智体应用,也欢迎大家通过*copilot*查询AgentScope各种小知识!
@@ -66,7 +72,7 @@ AgentScope提供了一系列`ModelWrapper`来支持本地模型服务和第三
| API | Task | Model Wrapper | Configuration | Some Supported Models |
|------------------------|-----------------|---------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------|-----------------------------------------------|
-| OpenAI API | Chat | [`OpenAIChatWrapper`](https://github.com/modelscope/agentscope/blob/main/src/agentscope/models/openai_model.py) |[guidance](https://modelscope.github.io/agentscope/en/tutorial/203-model.html#openai-api)
[template](https://github.com/modelscope/agentscope/blob/main/examples/model_configs_template/openai_chat_template.json) | gpt-4, gpt-3.5-turbo, ... |
+| OpenAI API | Chat | [`OpenAIChatWrapper`](https://github.com/modelscope/agentscope/blob/main/src/agentscope/models/openai_model.py) |[guidance](https://modelscope.github.io/agentscope/en/tutorial/203-model.html#openai-api)
[template](https://github.com/modelscope/agentscope/blob/main/examples/model_configs_template/openai_chat_template.json) | gpt-4o, gpt-4, gpt-3.5-turbo, ... |
| | Embedding | [`OpenAIEmbeddingWrapper`](https://github.com/modelscope/agentscope/blob/main/src/agentscope/models/openai_model.py) | [guidance](https://modelscope.github.io/agentscope/en/tutorial/203-model.html#openai-api)
[template](https://github.com/modelscope/agentscope/blob/main/examples/model_configs_template/openai_embedding_template.json) | text-embedding-ada-002, ... |
| | DALL·E | [`OpenAIDALLEWrapper`](https://github.com/modelscope/agentscope/blob/main/src/agentscope/models/openai_model.py) | [guidance](https://modelscope.github.io/agentscope/en/tutorial/203-model.html#openai-api)
[template](https://github.com/modelscope/agentscope/blob/main/examples/model_configs_template/openai_dall_e_template.json) | dall-e-2, dall-e-3 |
| DashScope API | Chat | [`DashScopeChatWrapper`](https://github.com/modelscope/agentscope/blob/main/src/agentscope/models/dashscope_model.py) | [guidance](https://modelscope.github.io/agentscope/en/tutorial/203-model.html#dashscope-api)
[template](https://github.com/modelscope/agentscope/blob/main/examples/model_configs_template/dashscope_chat_template.json) | qwen-plus, qwen-max, ... |
@@ -115,7 +121,10 @@ AgentScope支持使用以下库快速部署本地模型服务。
- [与ReAct智能体对话](./examples/conversation_with_react_agent)
- [通过对话查询SQL信息](./examples/conversation_nl2sql/)
- [与RAG智能体对话](./examples/conversation_with_RAG_agents)
- - [与自定义服务对话](./examples/conversation_with_customized_services/)
+ - [与gpt-4o模型对话](./examples/conversation_with_gpt-4o)
+ - [与SoftWare Engineering智能体对话](./examples/swe_agent/)
+ - - [与自定义服务对话](./examples/conversation_with_customized_services/)
+
- 游戏
- [五子棋](./examples/game_gomoku)
diff --git a/docs/sphinx_doc/en/source/tutorial/203-parser.md b/docs/sphinx_doc/en/source/tutorial/203-parser.md
new file mode 100644
index 000000000..a4e0538c3
--- /dev/null
+++ b/docs/sphinx_doc/en/source/tutorial/203-parser.md
@@ -0,0 +1,460 @@
+(203-parser-en)=
+
+# Model Response Parser
+
+## Table of Contents
+
+- [Background](#background)
+- [Parser Module](#parser-module)
+ - [Overview](#overview)
+ - [String Type](#string-type)
+ - [MarkdownCodeBlockParser](#markdowncodeblockparser)
+ - [Initialization](#initialization)
+ - [Format Instruction Template](#format-instruction-template)
+ - [Parse Function](#parse-function)
+ - [Dictionary Type](#dictionary-type)
+ - [MarkdownJsonDictParser](#markdownjsondictparser)
+ - [Initialization & Format Instruction Template](#initialization--format-instruction-template)
+ - [MultiTaggedContentParser](#multitaggedcontentparser)
+ - [Initialization & Format Instruction Template](#initialization--format-instruction-template-1)
+ - [Parse Function](#parse-function-1)
+ - [JSON / Python Object Type](#json--python-object-type)
+ - [MarkdownJsonObjectParser](#markdownjsonobjectparser)
+ - [Initialization & Format Instruction Template](#initialization--format-instruction-template-2)
+ - [Parse Function](#parse-function-2)
+- [Typical Use Cases](#typical-use-cases)
+ - [WereWolf Game](#werewolf-game)
+ - [ReAct Agent and Tool Usage](#react-agent-and-tool-usage)
+- [Customized Parser](#customized-parser)
+
+## Background
+
+In the process of building LLM-empowered application, parsing the LLM generated string into a specific format and extracting the required information is a very important step.
+However, due to the following reasons, this process is also a very complex process:
+
+1. **Diversity**: The target format of parsing is diverse, and the information to be extracted may be a specific text, a JSON object, or a complex data structure.
+2. **Complexity**: The result parsing is not only to convert the text generated by LLM into the target format, but also involves a series of issues such as prompt engineering (reminding LLM what format of output should be generated), error handling, etc.
+3. **Flexibility**: Even in the same application, different stages may also require the agent to generate output in different formats.
+
+For the convenience of developers, AgentScope provides a parser module to help developers parse LLM response into a specific format. By using the parser module, developers can easily parse the response into the target format by simple configuration, and switch the target format flexibly.
+
+In AgentScope, the parser module features
+1. **Flexibility**: Developers can flexibly set the required format, flexibly switch the parser without modifying the code of agent class. That is, the specific "target format" and the agent's `reply` function are decoupled.
+2. **Freedom**: The format instruction, result parsing and prompt engineering are all explicitly finished in the `reply` function. Developers and users can freely choose to use the parser or parse LLM response by their own code.
+3. **Transparency**: When using the parser, the process and results of prompt construction are completely visible and transparent to developers in the `reply` function, and developers can precisely debug their applications.
+
+## Parser Module
+
+### Overview
+
+The main functions of the parser module include:
+
+1. Provide "format instruction", that is, remind LLM where to generate what output, for example
+
+````
+You should generate python code in a fenced code block as follows
+```python
+{your_python_code}
+```
+````
+
+2. Provide a parse function, which directly parses the text generated by LLM into the target data format,
+
+3. Post-processing for dictionary format. After parsing the text into a dictionary, different fields may have different uses.
+
+AgentScope provides multiple built-in parsers, and developers can choose according to their needs.
+
+| Target Format | Parser Class | Description |
+| --- | --- |------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
+| String | `MarkdownCodeBlockParser` | Requires LLM to generate specified text within a Markdown code block marked by ```. The result is a string. |
+| Dictionary | `MarkdownJsonDictParser` | Requires LLM to produce a specified dictionary within the code block marked by \```json and \```. The result is a Python dictionary. |
+| | `MultiTaggedContentParser` | Requires LLM to generate specified content within multiple tags. Contents from different tags will be parsed into a single Python dictionary with different key-value pairs. |
+| JSON / Python Object Type | `MarkdownJsonObjectParser` | Requires LLM to produce specified content within the code block marked by \```json and \```. The result will be converted into a Python object via json.loads. |
+
+> **NOTE**: Compared to `MarkdownJsonDictParser`, `MultiTaggedContentParser` is more suitable for weak LLMs and when the required format is too complex.
+> For example, when LLM is required to generate Python code, if the code is returned directly within a dictionary, LLM needs to be aware of escaping characters (\t, \n, ...), and the differences between double and single quotes when calling `json.loads`
+>
+> In contrast, `MultiTaggedContentParser` guides LLM to generate each key-value pair separately in individual tags and then combines them into a dictionary, thus reducing the difficulty.
+
+
+In the following sections, we will introduce the usage of these parsers based on different target formats.
+
+### String Type
+
+#### MarkdownCodeBlockParser
+
+##### Initialization
+
+- `MarkdownCodeBlockParser` requires LLM to generate specific text within a specified code block in Markdown format. Different languages can be specified with the `language_name` parameter to utilize the large model's ability to produce corresponding outputs. For example, when asking the large model to produce Python code, initialize as follows:
+
+ ```python
+ from agentscope.parsers import MarkdownCodeBlockParser
+
+ parser = MarkdownCodeBlockParser(language_name="python", content_hint="your python code")
+ ```
+
+##### Format Instruction Template
+
+- `MarkdownCodeBlockParser` provides the following format instruction template. When the user calls the `format_instruction` attribute, `{language_name}` will be replaced with the string entered at initialization:
+
+ ````
+ You should generate {language_name} code in a {language_name} fenced code block as follows:
+ ```{language_name}
+ {content_hint}
+ ```
+ ````
+
+- For the above initialization with `language_name` as `"python"`, when the `format_instruction` attribute is called, the following string will be returned:
+
+ ```python
+ print(parser.format_instruction)
+ ```
+
+ ````
+ You should generate python code in a python fenced code block as follows
+ ```python
+ your python code
+ ```
+ ````
+
+##### Parse Function
+
+- `MarkdownCodeBlockParser` provides a `parse` method to parse the text generated by LLM。Its input and output are both `ModelResponse` objects, and the parsing result will be mounted on the `parsed` attribute of the output object.
+
+ ````python
+ res = parser.parse(
+ ModelResponse(
+ text="""The following is generated python code
+ ```python
+ print("Hello world!")
+ ```
+ """
+ )
+ )
+
+ print(res.parsed)
+ ````
+
+ ```
+ print("hello world!")
+ ```
+
+### Dictionary Type
+
+Different from string and general JSON/Python object, as a powerful format in LLM applications, AgentScope provides additional post-processing functions for dictionary type.
+When initializing the parser, you can set the `keys_to_content`, `keys_to_memory`, and `keys_to_metadata` parameters to achieve filtering of key-value pairs when calling the parser's `to_content`, `to_memory`, and `to_metadata` methods.
+
+- `keys_to_content` specifies the key-value pairs that will be placed in the `content` field of the returned `Msg` object. The content field will be returned to other agents, participate in their prompt construction, and will also be called by the `self.speak` function for display.
+- `keys_to_memory` specifies the key-value pairs that will be stored in the memory of the agent.
+- `keys_to_metadata` specifies the key-value pairs that will be placed in the `metadata` field of the returned `Msg` object, which can be used for application control flow judgment, or mount some information that does not need to be returned to other agents.
+
+The three parameters receive bool values, string and a list of strings. The meaning of their values is as follows:
+- `False`: The corresponding filter function will return `None`.
+- `True`: The whole dictionary will be returned.
+- `str`: The corresponding value will be directly returned.
+- `List[str]`: A filtered dictionary will be returned according to the list of keys.
+
+By default, `keys_to_content` and `keys_to_memory` are `True`, that is, the whole dictionary will be returned. `keys_to_metadata` defaults to `False`, that is, the corresponding filter function will return `None`.
+
+For example, the dictionary generated by the werewolf in the daytime discussion in a werewolf game. In this example,
+- `"thought"` should not be returned to other agents, but should be stored in the agent's memory to ensure the continuity of the werewolf strategy;
+- `"speak"` should be returned to other agents and stored in the agent's memory;
+- `"finish_discussion"` is used in the application's control flow to determine whether the discussion has ended. To save tokens, this field should not be returned to other agents or stored in the agent's memory.
+
+ ```python
+ {
+ "thought": "The others didn't realize I was a werewolf. I should end the discussion soon.",
+ "speak": "I agree with you.",
+ "finish_discussion": True
+ }
+ ```
+
+In AgentScope, we achieve post-processing by calling the `to_content`, `to_memory`, and `to_metadata` methods, as shown in the following code:
+
+- The code for the application's control flow, create the corresponding parser object and load it
+
+ ```python
+ from agentscope.parsers import MarkdownJsonDictParser
+
+ # ...
+
+ agent = DictDialogAgent(...)
+
+ # Take MarkdownJsonDictParser as example
+ parser = MarkdownJsonDictParser(
+ content_hint={
+ "thought": "what you thought",
+ "speak": "what you speak",
+ "finish_discussion": "whether the discussion is finished"
+ },
+ keys_to_content="speak",
+ keys_to_memory=["thought", "speak"],
+ keys_to_metadata=["finish_discussion"]
+ )
+
+ # Load parser, which is equivalent to specifying the required format
+ agent.set_parser(parser)
+
+ # The discussion process
+ while True:
+ # ...
+ x = agent(x)
+ # Break the loop according to the finish_discussion field in metadata
+ if x.metadata["finish_discussion"]:
+ break
+ ```
+
+- Filter the dictionary in the agent's `reply` function
+
+ ```python
+ # ...
+ def reply(x: dict = None) -> None:
+
+ # ...
+ res = self.model(prompt, parse_func=self.parser.parse)
+
+ # Story the thought and speak fields into memory
+ self.memory.add(
+ Msg(
+ self.name,
+ content=self.parser.to_memory(res.parsed),
+ role="assistant",
+ )
+ )
+
+ # Store in content and metadata fields in the returned Msg object
+ msg = Msg(
+ self.name,
+ content=self.parser.to_content(res.parsed),
+ role="assistant",
+ metadata=self.parser.to_metadata(res.parsed),
+ )
+ self.speak(msg)
+
+ return msg
+ ```
+
+> **Note**: `keys_to_content`, `keys_to_memory`, and `keys_to_metadata` parameters can be a string, a list of strings, or a bool value.
+> - For `True`, the `to_content`, `to_memory`, and `to_metadata` methods will directly return the whole dictionary.
+> - For `False`, the `to_content`, `to_memory`, and `to_metadata` methods will directly return `None`.
+> - For a string, the `to_content`, `to_memory`, and `to_metadata` methods will directly extract the corresponding value. For example, if `keys_to_content="speak"`, the `to_content` method will put `res.parsed["speak"]` into the `content` field of the `Msg` object, and the `content` field will be a string rather than a dictionary.
+> - For a list of string, the `to_content`, `to_memory`, and `to_metadata` methods will filter the dictionary according to the list of keys.
+> ```python
+> parser = MarkdownJsonDictParser(
+> content_hint={
+> "thought": "what you thought",
+> "speak": "what you speak",
+> },
+> keys_to_content="speak",
+> keys_to_memory=["thought", "speak"],
+> )
+>
+> example_dict = {"thought": "abc", "speak": "def"}
+> print(parser.to_content(example_dict)) # def
+> print(parser.to_memory(example_dict)) # {"thought": "abc", "speak": "def"}
+> print(parser.to_metadata(example_dict)) # None
+> ```
+> ```
+> def
+> {"thought": "abc", "speak": "def"}
+> None
+> ```
+
+
+Next we will introduce two parsers for dictionary type.
+
+#### MarkdownJsonDictParser
+
+##### Initialization & Format Instruction Template
+
+- `MarkdownJsonDictParser` requires LLM to generate dictionary within a code block fenced by \```json and \``` tags.
+
+- Except `keys_to_content`, `keys_to_memory` and `keys_to_metadata`, the `content_hint` parameter can be provided to give an example and explanation of the response result, that is, to remind LLM where and what kind of dictionary should be generated.
+This parameter can be a string or a dictionary. For dictionary, it will be automatically converted to a string when constructing the format instruction.
+
+ ```python
+ from agentscope.parsers import MarkdownJsonDictParser
+
+ # dictionary as content_hint
+ MarkdownJsonDictParser(
+ content_hint={
+ "thought": "what you thought",
+ "speak": "what you speak",
+ }
+ )
+ # or string as content_hint
+ MarkdownJsonDictParser(
+ content_hint="""{
+ "thought": "what you thought",
+ "speak": "what you speak",
+ }"""
+ )
+ ```
+
+ - The corresponding `instruction_format` attribute
+
+ ````
+ You should respond a json object in a json fenced code block as follows:
+ ```json
+ {content_hint}
+ ```
+ ````
+
+#### MultiTaggedContentParser
+
+`MultiTaggedContentParser` asks LLM to generate specific content within multiple tag pairs. The content from different tag pairs will be parsed into a single Python dictionary. Its usage is similar to `MarkdownJsonDictParser`, but the initialization method is different, and it is more suitable for weak LLMs or complex return content.
+
+##### Initialization & Format Instruction Template
+
+Within `MultiTaggedContentParser`, each tag pair will be specified by as `TaggedContent` object, which contains
+- Tag name (`name`), the key value in the returned dictionary
+- Start tag (`tag_begin`)
+- Hint for content (`content_hint`)
+- End tag (`tag_end`)
+- Content parsing indication (`parse_json`), default as `False`. When set to `True`, the parser will automatically add hint that requires JSON object between the tags, and its extracted content will be parsed into a Python object via `json.loads`
+
+```python
+from agentscope.parsers import MultiTaggedContentParser, TaggedContent
+parser = MultiTaggedContentParser(
+ TaggedContent(
+ name="thought",
+ tag_begin="[THOUGHT]",
+ content_hint="what you thought",
+ tag_end="[/THOUGHT]"
+ ),
+ TaggedContent(
+ name="speak",
+ tag_begin="[SPEAK]",
+ content_hint="what you speak",
+ tag_end="[/SPEAK]"
+ ),
+ TaggedContent(
+ name="finish_discussion",
+ tag_begin="[FINISH_DISCUSSION]",
+ content_hint="true/false, whether the discussion is finished",
+ tag_end="[/FINISH_DISCUSSION]",
+ parse_json=True, # we expect the content of this field to be parsed directly into a Python boolean value
+ )
+)
+
+print(parser.format_instruction)
+```
+
+```
+Respond with specific tags as outlined below, and the content between [FINISH_DISCUSSION] and [/FINISH_DISCUSSION] MUST be a JSON object:
+[THOUGHT]what you thought[/THOUGHT]
+[SPEAK]what you speak[/SPEAK]
+[FINISH_DISCUSSION]true/false, whether the discussion is finished[/FINISH_DISCUSSION]
+```
+
+##### Parse Function
+
+- `MultiTaggedContentParser`'s parsing result is a dictionary, whose keys are the value of `name` in the `TaggedContent` objects.
+The following is an example of parsing the LLM response in the werewolf game:
+
+```python
+res_dict = parser.parse(
+ ModelResponse(
+ text="""As a werewolf, I should keep pretending to be a villager
+[THOUGHT]The others didn't realize I was a werewolf. I should end the discussion soon.[/THOUGHT]
+[SPEAK]I agree with you.[/SPEAK]
+[FINISH_DISCUSSION]true[/FINISH_DISCUSSION]"""
+ )
+)
+
+print(res_dict)
+```
+
+```
+{
+ "thought": "The others didn't realize I was a werewolf. I should end the discussion soon.",
+ "speak": "I agree with you.",
+ "finish_discussion": true
+}
+```
+
+### JSON / Python Object Type
+
+#### MarkdownJsonObjectParser
+
+`MarkdownJsonObjectParser` also uses the \```json and \``` tags in Markdown, but does not limit the content type. It can be a list, dictionary, number, string, etc., which can be parsed into a Python object via `json.loads`.
+
+##### Initialization & Format Instruction Template
+
+```python
+from agentscope.parsers import MarkdownJsonObjectParser
+
+parser = MarkdownJsonObjectParser(
+ content_hint="{A list of numbers.}"
+)
+
+print(parser.format_instruction)
+```
+
+````
+You should respond a json object in a json fenced code block as follows:
+```json
+{a list of numbers}
+```
+````
+
+##### Parse Function
+
+````python
+res = parser.parse(
+ ModelResponse(
+ text="""Yes, here is the generated list
+```json
+[1,2,3,4,5]
+```
+""")
+)
+
+print(type(res))
+print(res)
+````
+
+```
+
+[1, 2, 3, 4, 5]
+```
+
+## Typical Use Cases
+
+### WereWolf Game
+
+Werewolf game is a classic use case of dictionary parser. In different stages of the game, the same agent needs to generate different identification fields in addition to `"thought"` and `"speak"`, such as whether the discussion is over, whether the seer uses its ability, whether the witch uses the antidote and poison, and voting.
+
+AgentScope has built-in examples of [werewolf game](https://github.com/modelscope/agentscope/tree/main/examples/game_werewolf), which uses `DictDialogAgent` class and different parsers to achieve flexible target format switching. By using the post-processing function of the parser, it separates "thought" and "speak", and controls the progress of the game successfully.
+More details can be found in the werewolf game [source code](https://github.com/modelscope/agentscope/tree/main/examples/game_werewolf).
+
+### ReAct Agent and Tool Usage
+
+`ReActAgent` is an agent class built for tool usage in AgentScope, based on the ReAct algorithm, and can be used with different tool functions. The tool call, format parsing, and implementation of `ReActAgent` are similar to the parser. For detailed implementation, please refer to the [source code](https://github.com/modelscope/agentscope/blob/main/src/agentscope/agents/react_agent.py).
+
+
+## Customized Parser
+
+AgentScope provides a base class `ParserBase` for parsers. Developers can inherit this base class, and implement the `format_instruction` attribute and `parse` method to create their own parser.
+
+For dictionary type parsing, you can also inherit the `agentscope.parser.DictFilterMixin` class to implement post-processing for dictionary type.
+
+```python
+from abc import ABC, abstractmethod
+
+from agentscope.models import ModelResponse
+
+
+class ParserBase(ABC):
+ """The base class for model response parser."""
+
+ format_instruction: str
+ """The instruction for the response format."""
+
+ @abstractmethod
+ def parse(self, response: ModelResponse) -> ModelResponse:
+ """Parse the response text to a specific object, and stored in the
+ parsed field of the response object."""
+
+ # ...
+```
diff --git a/docs/sphinx_doc/en/source/tutorial/206-prompt.md b/docs/sphinx_doc/en/source/tutorial/206-prompt.md
index f99c86024..e30e8abd8 100644
--- a/docs/sphinx_doc/en/source/tutorial/206-prompt.md
+++ b/docs/sphinx_doc/en/source/tutorial/206-prompt.md
@@ -64,6 +64,8 @@ dictionaries as input, where the dictionary must obey the following rules
#### Prompt Strategy
+##### Non-Vision Models
+
In OpenAI Chat API, the `name` field enables the model to distinguish
different speakers in the conversation. Therefore, the strategy of `format`
function in `OpenAIChatWrapper` is simple:
@@ -100,6 +102,75 @@ print(prompt)
]
```
+##### Vision Models
+
+For vision models (gpt-4-turbo, gpt-4o, ...), if the input message contains image urls, the generated `content` field will be a list of dicts, which contains text and image urls.
+
+Specifically, the web image urls will be pass to OpenAI Chat API directly, while the local image urls will be converted to base64 format. More details please refer to the [official guidance](https://platform.openai.com/docs/guides/vision).
+
+Note the invalid image urls (e.g. `/Users/xxx/test.mp3`) will be ignored.
+
+```python
+from agentscope.models import OpenAIChatWrapper
+from agentscope.message import Msg
+
+model = OpenAIChatWrapper(
+ config_name="", # empty since we directly initialize the model wrapper
+ model_name="gpt-4o",
+)
+
+prompt = model.format(
+ Msg("system", "You're a helpful assistant", role="system"), # Msg object
+ [ # a list of Msg objects
+ Msg(name="user", content="Describe this image", role="user", url="https://xxx.png"),
+ Msg(name="user", content="And these images", role="user", url=["/Users/xxx/test.png", "/Users/xxx/test.mp3"]),
+ ],
+)
+print(prompt)
+```
+
+```python
+[
+ {
+ "role": "system",
+ "name": "system",
+ "content": "You are a helpful assistant"
+ },
+ {
+ "role": "user",
+ "name": "user",
+ "content": [
+ {
+ "type": "text",
+ "text": "Describe this image"
+ },
+ {
+ "type": "image_url",
+ "image_url": {
+ "url": "https://xxx.png"
+ }
+ },
+ ]
+ },
+ {
+ "role": "user",
+ "name": "user",
+ "content": [
+ {
+ "type": "text",
+ "text": "And these images"
+ },
+ {
+ "type": "image_url",
+ "image_url": {
+ "url": "data:image/png;base64,YWJjZGVm..." # for /Users/xxx/test.png
+ }
+ },
+ ]
+ },
+]
+```
+
### DashScopeChatWrapper
`DashScopeChatWrapper` encapsulates the DashScope chat API, which takes a list of messages as input. The message must obey the following rules (updated in 2024/03/22):
diff --git a/docs/sphinx_doc/en/source/tutorial/advance.rst b/docs/sphinx_doc/en/source/tutorial/advance.rst
index ff483b9b2..64bd86508 100644
--- a/docs/sphinx_doc/en/source/tutorial/advance.rst
+++ b/docs/sphinx_doc/en/source/tutorial/advance.rst
@@ -7,6 +7,7 @@ Advanced Exploration
201-agent.md
202-pipeline.md
203-model.md
+ 203-parser.md
204-service.md
205-memory.md
206-prompt.md
diff --git a/docs/sphinx_doc/zh_CN/source/tutorial/203-parser.md b/docs/sphinx_doc/zh_CN/source/tutorial/203-parser.md
new file mode 100644
index 000000000..527f2960e
--- /dev/null
+++ b/docs/sphinx_doc/zh_CN/source/tutorial/203-parser.md
@@ -0,0 +1,456 @@
+(203-parser-zh)=
+
+# 模型结果解析
+
+## 目录
+
+- [背景](#背景)
+- [解析器模块](#解析器模块)
+ - [功能说明](#功能说明)
+ - [字符串类型](#字符串str类型)
+ - [MarkdownCodeBlockParser](#markdowncodeblockparser)
+ - [初始化](#初始化)
+ - [响应格式模版](#响应格式模版)
+ - [解析函数](#解析函数)
+ - [字典类型](#字典dict类型)
+ - [MarkdownJsonDictParser](#markdownjsondictparser)
+ - [初始化 & 响应格式模版](#初始化--响应格式模版)
+ - [MultiTaggedContentParser](#multitaggedcontentparser)
+ - [初始化 & 响应格式模版](#初始化--响应格式模版-1)
+ - [解析函数](#解析函数-1)
+ - [JSON / Python 对象类型](#json--python-对象类型)
+ - [MarkdownJsonObjectParser](#markdownjsonobjectparser)
+ - [初始化 & 响应格式模版](#初始化--响应格式模版-2)
+ - [解析函数](#解析函数-2)
+- [典型使用样例](#典型使用样例)
+ - [狼人杀游戏](#狼人杀游戏)
+ - [ReAct 智能体和工具使用](#react-智能体和工具使用)
+- [自定义解析器](#自定义解析器)
+
+
+## 背景
+
+利用LLM构建应用的过程中,将 LLM 产生的字符串解析成指定的格式,提取出需要的信息,是一个非常重要的环节。
+但同时由于下列原因,这个过程也是一个非常复杂的过程:
+
+1. **多样性**:解析的目标格式多种多样,需要提取的信息可能是一段特定文本,一个JSON对象,或者是一个复杂的数据结构。
+2. **复杂性**:结果解析不仅仅是将 LLM 产生的文本转换成目标格式,还涉及到提示工程(提醒 LLM 应该产生什么格式的输出),错误处理等一些列问题。
+3. **灵活性**:同一个应用中,不同阶段也可能需要智能体产生不同格式的输出。
+
+为了让开发者能够便捷、灵活的地进行结果解析,AgentScope设计并提供了解析器模块(Parser)。利用该模块,开发者可以通过简单的配置,实现目标格式的解析,同时可以灵活的切换解析的目标格式。
+
+AgentScope中,解析器模块的设计原则是:
+1. **灵活**:开发者可以灵活设置所需返回格式、灵活地切换解析器,实现不同格式的解析,而无需修改智能体类的代码,即具体的“目标格式”与智能体类内`reply`函数的处理逻辑解耦
+2. **自由**:用户可以自由选择是否使用解析器。解析器所提供的响应格式提示、解析结果等功能都是在`reply`函数内显式调用的,用户可以自由选择使用解析器或是自己实现代码实现结果解析
+3. **透明**:利用解析器时,提示(prompt)构建的过程和结果在`reply`函数内对开发者完全可见且透明,开发者可以精确调试自己的应用。
+
+## 解析器模块
+
+### 功能说明
+
+解析器模块(Parser)的主要功能包括:
+
+1. 提供“响应格式说明”(format instruction),即提示 LLM 应该在什么位置产生什么输出,例如
+
+````
+You should generate python code in a fenced code block as follows
+```python
+{your_python_code}
+```
+````
+
+
+2. 提供解析函数(parse function),直接将 LLM 产生的文本解析成目标数据格式
+
+3. 针对字典格式的后处理功能。在将文本解析成字典后,其中不同的字段可能有不同的用处
+
+AgentScope提供了多种不同解析器,开发者可以根据自己的需求进行选择。
+
+| 目标格式 | 解析器 | 说明 |
+|-------------------|----------------------------|-----------------------------------------------------------------------------|
+| 字符串(`str`)类型 | `MarkdownCodeBlockParser` | 要求 LLM 将指定的文本生成到Markdown中以 ``` 标识的代码块中,解析结果为字符串。 |
+| 字典(`dict`)类型 | `MarkdownJsonDictParser` | 要求 LLM 在 \```json 和 \``` 标识的代码块中产生指定内容的字典,解析结果为 Python 字典。 |
+| | `MultiTaggedContentParser` | 要求 LLM 在多个标签中产生指定内容,这些不同标签中的内容将一同被解析成一个 Python 字典,并填入不同的键值对中。 |
+| JSON / Python对象类型 | `MarkdownJsonObjectParser` | 要求 LLM 在 \```json 和 \``` 标识的代码块中产生指定的内容,解析结果将通过 `json.loads` 转换成 Python 对象。 |
+
+> **NOTE**: 相比`MarkdownJsonDictParser`,`MultiTaggedContentParser`更适合于模型能力不强,以及需要 LLM 返回内容过于复杂的情况。例如 LLM 返回 Python 代码,如果直接在字典中返回代码,那么 LLM 需要注意特殊字符的转义(\t,\n,...),`json.loads`读取时对双引号和单引号的区分等问题。而`MultiTaggedContentParser`实际是让大模型在每个单独的标签中返回各个键值,然后再将它们组成字典,从而降低了LLM返回的难度。
+
+下面我们将根据不同的目标格式,介绍这些解析器的用法。
+
+### 字符串(`str`)类型
+
+#### MarkdownCodeBlockParser
+
+##### 初始化
+
+- `MarkdownCodeBlockParser`采用 Markdown 代码块的形式,要求 LLM 将指定文本产生到指定的代码块中。可以通过`language_name`参数指定不同的语言,从而利用大模型代码能力产生对应的输出。例如要求大模型产生 Python 代码时,初始化如下:
+
+ ```python
+ from agentscope.parsers import MarkdownCodeBlockParser
+
+ parser = MarkdownCodeBlockParser(language_name="python", content_hint="your python code")
+ ```
+
+##### 响应格式模版
+
+- `MarkdownCodeBlockParser`类提供如下的“响应格式说明”模版,在用户调用`format_instruction`属性时,会将`{language_name}`替换为初始化时输入的字符串:
+
+ ````
+ You should generate {language_name} code in a {language_name} fenced code block as follows:
+ ```{language_name}
+ {content_hint}
+ ```
+ ````
+
+- 例如上述对`language_name`为`"python"`的初始化,调用`format_instruction`属性时,会返回如下字符串:
+
+ ```python
+ print(parser.format_instruction)
+ ```
+
+ ````
+ You should generate python code in a python fenced code block as follows
+ ```python
+ your python code
+ ```
+ ````
+
+##### 解析函数
+
+- `MarkdownCodeBlockParser`类提供`parse`方法,用于解析LLM产生的文本,返回的是字符串。
+
+ ````python
+ res = parser.parse(
+ ModelResponse(
+ text="""The following is generated python code
+ ```python
+ print("Hello world!")
+ ```
+ """
+ )
+ )
+
+ print(res.parsed)
+ ````
+
+ ```
+ print("hello world!")
+ ```
+
+### 字典(`dict`)类型
+
+与字符串和一般的 JSON / Python 对象不同,作为LLM应用中常用的数据格式,AgentScope为字典类型提供了额外的后处理功能。初始化解析器时,可以通过额外设置`keys_to_content`,`keys_to_memory`,`keys_to_metadata`三个参数,从而实现在调用`parser`的`to_content`,`to_memory`和`to_metadata`方法时,对字典键值对的过滤。
+其中
+ - `keys_to_content` 指定的键值对将被放置在返回`Msg`对象中的`content`字段,这个字段内容将会被返回给其它智能体,参与到其他智能体的提示构建中,同时也会被`self.speak`函数调用,用于显式输出
+ - `keys_to_memory` 指定的键值对将被存储到智能体的记忆中
+ - `keys_to_metadata` 指定的键值对将被放置在`Msg`对象的`metadata`字段,可以用于应用的控制流程判断,或挂载一些不需要返回给其它智能体的信息。
+
+三个参数接收布尔值、字符串和字符串列表。其值的含义如下:
+- `False`: 对应的过滤函数将返回`None`。
+- `True`: 整个字典将被返回。
+- `str`: 对应的键值将被直接返回,注意返回的会是对应的值而非字典。
+- `List[str]`: 根据键值对列表返回过滤后的字典。
+
+AgentScope中,`keys_to_content` 和 `keys_to_memory` 默认为 `True`,即整个字典将被返回。`keys_to_metadata` 默认为 `False`,即对应的过滤函数将返回 `None`。
+
+下面是狼人杀游戏的样例,在白天讨论过程中 LLM 扮演狼人产生的字典。在这个例子中,
+- `"thought"`字段不应该返回给其它智能体,但是应该存储在智能体的记忆中,从而保证狼人策略的延续;
+- `"speak"`字段应该被返回给其它智能体,并且存储在智能体记忆中;
+- `"finish_discussion"`字段用于应用的控制流程中,判断讨论是否已经结束。为了节省token,该字段不应该被返回给其它的智能体,同时也不应该存储在智能体的记忆中。
+
+ ```python
+ {
+ "thought": "The others didn't realize I was a werewolf. I should end the discussion soon.",
+ "speak": "I agree with you.",
+ "finish_discussion": True
+ }
+ ```
+
+AgentScope中,我们通过调用`to_content`,`to_memory`和`to_metadata`方法实现后处理功能,示意代码如下:
+
+- 应用中的控制流代码,创建对应的解析器对象并装载
+
+ ```python
+ from agentscope.parsers import MarkdownJsonDictParser
+
+ # ...
+
+ agent = DictDialogAgent(...)
+
+ # 以MarkdownJsonDictParser为例
+ parser = MarkdownJsonDictParser(
+ content_hint={
+ "thought": "what you thought",
+ "speak": "what you speak",
+ "finish_discussion": "whether the discussion is finished"
+ },
+ keys_to_content="speak",
+ keys_to_memory=["thought", "speak"],
+ keys_to_metadata=["finish_discussion"]
+ )
+
+ # 装载解析器,即相当于指定了要求的相应格式
+ agent.set_parser(parser)
+
+ # 讨论过程
+ while True:
+ # ...
+ x = agent(x)
+ # 根据metadata字段,获取LLM对当前是否应该结束讨论的判断
+ if x.metadata["finish_discussion"]:
+ break
+ ```
+
+
+- 智能体内部`reply`函数内实现字典的过滤
+
+ ```python
+ # ...
+ def reply(x: dict = None) -> None:
+
+ # ...
+ res = self.model(prompt, parse_func=self.parser.parse)
+
+ # 过滤后拥有 thought 和 speak 字段的字典,存储到智能体记忆中
+ self.memory.add(
+ Msg(
+ self.name,
+ content=self.parser.to_memory(res.parsed),
+ role="assistant",
+ )
+ )
+
+ # 存储到content中,同时存储到metadata中
+ msg = Msg(
+ self.name,
+ content=self.parser.to_content(res.parsed),
+ role="assistant",
+ metadata=self.parser.to_metadata(res.parsed),
+ )
+ self.speak(msg)
+
+ return msg
+ ```
+
+
+
+
+> **Note**: `keys_to_content`,`keys_to_memory`和`keys_to_metadata`参数可以是列表,字符串,也可以是布尔值。
+> - 如果是`True`,则会直接返回整个字典,即不进行过滤
+> - 如果是`False`,则会直接返回`None`值
+> - 如果是字符串类型,则`to_content`,`to_memory`和`to_metadata`方法将会把字符串对应的键值直接放入到对应的位置,例如`keys_to_content="speak"`,则`to_content`方法将会把`res.parsed["speak"]`放入到`Msg`对象的`content`字段中,`content`字段会是字符串而不是字典。
+> - 如果是列表类型,则`to_content`,`to_memory`和`to_metadata`方法实现的将是过滤功能,对应过滤后的结果是字典
+> ```python
+> parser = MarkdownJsonDictParser(
+> content_hint={
+> "thought": "what you thought",
+> "speak": "what you speak",
+> },
+> keys_to_content="speak",
+> keys_to_memory=["thought", "speak"],
+> )
+>
+> example_dict = {"thought": "abc", "speak": "def"}
+> print(parser.to_content(example_dict)) # def
+> print(parser.to_memory(example_dict)) # {"thought": "abc", "speak": "def"}
+> print(parser.to_metadata(example_dict)) # None
+> ```
+> ```
+> def
+> {"thought": "abc", "speak": "def"}
+> None
+> ```
+
+下面我们具体介绍两种字典类型的解析器。
+
+#### MarkdownJsonDictParser
+
+##### 初始化 & 响应格式模版
+
+- `MarkdownJsonDictParser`要求 LLM 在 \```json 和 \``` 标识的代码块中产生指定内容的字典。
+- 除了`to_content`,`to_memory`和`to_metadata`参数外,可以通过提供 `content_hint` 参数提供响应结果样例和说明,即提示LLM应该产生什么样子的字典,该参数可以是字符串,也可以是字典,在构建响应格式提示的时候将会被自动转换成字符串进行拼接。
+
+ ```python
+ from agentscope.parsers import MarkdownJsonDictParser
+
+ # 字典
+ MarkdownJsonDictParser(
+ content_hint={
+ "thought": "what you thought",
+ "speak": "what you speak",
+ }
+ )
+ # 或字符串
+ MarkdownJsonDictParser(
+ content_hint="""{
+ "thought": "what you thought",
+ "speak": "what you speak",
+ }"""
+ )
+ ```
+ - 对应的`instruction_format`属性
+
+ ````
+ You should respond a json object in a json fenced code block as follows:
+ ```json
+ {content_hint}
+ ```
+ ````
+
+#### MultiTaggedContentParser
+
+`MultiTaggedContentParser`要求 LLM 在多个指定的标签对中产生指定的内容,这些不同标签的内容将一同被解析为一个 Python 字典。使用方法与`MarkdownJsonDictParser`类似,只是初始化方法不同,更适合能力较弱的LLM,或是比较复杂的返回内容。
+
+##### 初始化 & 响应格式模版
+
+`MultiTaggedContentParser`中,每一组标签将会以`TaggedContent`对象的形式传入,其中`TaggedContent`对象包含了
+- 标签名(`name`),即返回字典中的key值
+- 开始标签(`tag_begin`)
+- 标签内容提示(`content_hint`)
+- 结束标签(`tag_end`)
+- 内容解析功能(`parse_json`),默认为`False`。当置为`True`时,将在响应格式提示中自动添加提示,并且提取出的内容将通过`json.loads`解析成 Python 对象
+
+```python
+from agentscope.parsers import MultiTaggedContentParser, TaggedContent
+parser = MultiTaggedContentParser(
+ TaggedContent(
+ name="thought",
+ tag_begin="[THOUGHT]",
+ content_hint="what you thought",
+ tag_end="[/THOUGHT]"
+ ),
+ TaggedContent(
+ name="speak",
+ tag_begin="[SPEAK]",
+ content_hint="what you speak",
+ tag_end="[/SPEAK]"
+ ),
+ TaggedContent(
+ name="finish_discussion",
+ tag_begin="[FINISH_DISCUSSION]",
+ content_hint="true/false, whether the discussion is finished",
+ tag_end="[/FINISH_DISCUSSION]",
+ parse_json=True, # 我们希望这个字段的内容直接被解析成 True 或 False 的 Python 布尔值
+ )
+)
+
+print(parser.format_instruction)
+```
+
+```
+Respond with specific tags as outlined below, and the content between [FINISH_DISCUSSION] and [/FINISH_DISCUSSION] MUST be a JSON object:
+[THOUGHT]what you thought[/THOUGHT]
+[SPEAK]what you speak[/SPEAK]
+[FINISH_DISCUSSION]true/false, whether the discussion is finished[/FINISH_DISCUSSION]
+```
+
+##### 解析函数
+
+- `MultiTaggedContentParser`的解析结果为字典,其中key为`TaggedContent`对象的`name`的值,以下是狼人杀中解析 LLM 返回的样例:
+
+```python
+res_dict = parser.parse(
+ ModelResponse(text="""As a werewolf, I should keep pretending to be a villager
+[THOUGHT]The others didn't realize I was a werewolf. I should end the discussion soon.[/THOUGHT]
+[SPEAK]I agree with you.[/SPEAK]
+[FINISH_DISCUSSION]true[/FINISH_DISCUSSION]
+"""
+ )
+)
+
+print(res_dict)
+```
+
+```
+{
+ "thought": "The others didn't realize I was a werewolf. I should end the discussion soon.",
+ "speak": "I agree with you.",
+ "finish_discussion": true
+}
+```
+
+### JSON / Python 对象类型
+
+#### MarkdownJsonObjectParser
+
+`MarkdownJsonObjectParser`同样采用 Markdown 的\```json和\```标识,但是不限制解析的内容的类型,可以是列表,字典,数值,字符串等可以通过`json.loads`进行解析字符串。
+
+##### 初始化 & 响应格式模版
+
+```python
+from agentscope.parsers import MarkdownJsonObjectParser
+
+parser = MarkdownJsonObjectParser(
+ content_hint="{A list of numbers.}"
+)
+
+print(parser.format_instruction)
+```
+
+````
+You should respond a json object in a json fenced code block as follows:
+```json
+{a list of numbers}
+```
+````
+
+##### 解析函数
+
+````python
+res = parser.parse(
+ ModelResponse(text="""Yes, here is the generated list
+```json
+[1,2,3,4,5]
+```
+"""
+ )
+)
+
+print(type(res))
+print(res)
+````
+
+```
+
+[1, 2, 3, 4, 5]
+```
+
+## 典型使用样例
+
+### 狼人杀游戏
+
+狼人杀(Werewolf)是字典解析器的一个经典使用场景,在游戏的不同阶段内,需要同一个智能体在不同阶段产生除了`"thought"`和`"speak"`外其它的标识字段,例如是否结束讨论,预言家是否使用能力,女巫是否使用解药和毒药,投票等。
+
+AgentScope中已经内置了[狼人杀](https://github.com/modelscope/agentscope/tree/main/examples/game_werewolf)的样例,该样例采用`DictDialogAgent`类,配合不同的解析器,实现了灵活的目标格式切换。同时利用解析器的后处理功能,实现了“想”与“说”的分离,同时控制游戏流程的推进。
+详细实现请参考狼人杀[源码](https://github.com/modelscope/agentscope/tree/main/examples/game_werewolf)。
+
+### ReAct 智能体和工具使用
+
+`ReActAgent`是AgentScope中为了工具使用构建的智能体类,基于 ReAct 算法进行搭建,可以配合不同的工具函数进行使用。其中工具的调用,格式解析,采用了和解析器同样的实现思路。详细实现请参考[代码](https://github.com/modelscope/agentscope/blob/main/src/agentscope/agents/react_agent.py)。
+
+
+## 自定义解析器
+
+AgentScope中提供了解析器的基类`ParserBase`,开发者可以通过继承该基类,并实现其中的`format_instruction`属性和`parse`方法来实现自己的解析器。
+
+针对目标格式是字典类型的解析,可以额外继承`agentscope.parser.DictFilterMixin`类实现对字典类型的后处理。
+
+```python
+from abc import ABC, abstractmethod
+
+from agentscope.models import ModelResponse
+
+
+class ParserBase(ABC):
+ """The base class for model response parser."""
+
+ format_instruction: str
+ """The instruction for the response format."""
+
+ @abstractmethod
+ def parse(self, response: ModelResponse) -> ModelResponse:
+ """Parse the response text to a specific object, and stored in the
+ parsed field of the response object."""
+
+ # ...
+```
diff --git a/docs/sphinx_doc/zh_CN/source/tutorial/206-prompt.md b/docs/sphinx_doc/zh_CN/source/tutorial/206-prompt.md
index d77fb673a..7ed143cfe 100644
--- a/docs/sphinx_doc/zh_CN/source/tutorial/206-prompt.md
+++ b/docs/sphinx_doc/zh_CN/source/tutorial/206-prompt.md
@@ -42,6 +42,8 @@ AgentScope为以下的模型API提供了内置的提示构建策略。
#### 提示的构建策略
+##### 非视觉(Vision)模型
+
在OpenAI Chat API中,`name`字段使模型能够区分对话中的不同发言者。因此,`OpenAIChatWrapper`中`format`函数的策略很简单:
- `Msg`: 直接将带有`role`、`content`和`name`字段的字典传递给API。
@@ -76,6 +78,75 @@ print(prompt)
]
```
+##### 视觉(Vision)模型
+
+对支持视觉的模型而言,如果输入消息包含图像url,生成的`content`字段将是一个字典的列表,其中包含文本和图像url。
+
+具体来说,如果是网络图片url,将直接传递给OpenAI Chat API,而本地图片url将被转换为base64格式。更多细节请参考[官方指南](https://platform.openai.com/docs/guides/vision)。
+
+注意无效的图片url(例如`/Users/xxx/test.mp3`)将被忽略。
+
+```python
+from agentscope.models import OpenAIChatWrapper
+from agentscope.message import Msg
+
+model = OpenAIChatWrapper(
+ config_name="", # 为空,因为我们直接初始化model wrapper
+ model_name="gpt-4o",
+)
+
+prompt = model.format(
+ Msg("system", "You're a helpful assistant", role="system"), # Msg 对象
+ [ # Msg 对象的列表
+ Msg(name="user", content="Describe this image", role="user", url="https://xxx.png"),
+ Msg(name="user", content="And these images", role="user", url=["/Users/xxx/test.png", "/Users/xxx/test.mp3"]),
+ ],
+)
+print(prompt)
+```
+
+```python
+[
+ {
+ "role": "system",
+ "name": "system",
+ "content": "You are a helpful assistant"
+ },
+ {
+ "role": "user",
+ "name": "user",
+ "content": [
+ {
+ "type": "text",
+ "text": "Describe this image"
+ },
+ {
+ "type": "image_url",
+ "image_url": {
+ "url": "https://xxx.png"
+ }
+ },
+ ]
+ },
+ {
+ "role": "user",
+ "name": "user",
+ "content": [
+ {
+ "type": "text",
+ "text": "And these images"
+ },
+ {
+ "type": "image_url",
+ "image_url": {
+ "url": "data:image/png;base64,YWJjZGVm..." # 对应 /Users/xxx/test.png
+ }
+ },
+ ]
+ },
+]
+```
+
### `DashScopeChatWrapper`
`DashScopeChatWrapper`封装了DashScope聊天API,它接受消息列表作为输入。消息必须遵守以下规则:
diff --git a/docs/sphinx_doc/zh_CN/source/tutorial/advance.rst b/docs/sphinx_doc/zh_CN/source/tutorial/advance.rst
index 9de74f5cd..17ab3d8c8 100644
--- a/docs/sphinx_doc/zh_CN/source/tutorial/advance.rst
+++ b/docs/sphinx_doc/zh_CN/source/tutorial/advance.rst
@@ -7,6 +7,7 @@
201-agent.md
202-pipeline.md
203-model.md
+ 203-parser.md
204-service.md
205-memory.md
206-prompt.md
diff --git a/examples/conversation_with_gpt-4o/README.md b/examples/conversation_with_gpt-4o/README.md
new file mode 100644
index 000000000..715d57f58
--- /dev/null
+++ b/examples/conversation_with_gpt-4o/README.md
@@ -0,0 +1,54 @@
+# Conversation with gpt-4o (OpenAI Vision Model)
+
+This example will show
+- How to use gpt-4o and other OpenAI vision models in AgentScope
+
+In this example,
+- you can have a conversation with OpenAI vision models.
+- you can show gpt-4o with your drawings or web ui designs and look for its suggestions.
+- you can share your pictures with gpt-4o and ask for its comments,
+
+Just input your image url (both local and web URLs are supported) and talk with gpt-4o.
+
+
+## Background
+
+In May 13, 2024, OpenAI released their new model, gpt-4o, which is a large multimodal model that can process both text and multimodal data.
+
+
+## Tested Models
+
+The following models are tested in this example. For other models, some modifications may be needed.
+- gpt-4o
+- gpt-4-turbo
+- gpt-4-vision
+
+
+## Prerequisites
+
+You need to satisfy the following requirements to run this example.
+- Install the latest version of AgentScope by
+ ```bash
+ git clone https://github.com/modelscope/agentscope.git
+ cd agentscope
+ pip install -e .
+ ```
+- Prepare an OpenAI API key
+
+## Running the Example
+
+First fill your OpenAI API key in `conversation_with_gpt-4o.py`, then execute the following command to run the conversation with gpt-4o.
+
+```bash
+python conversation_with_gpt-4o.py
+```
+
+## A Running Example
+
+- Conversation history with gpt-4o.
+
+
+
+- My picture
+
+
diff --git a/examples/conversation_with_gpt-4o/conversation_with_gpt-4o.py b/examples/conversation_with_gpt-4o/conversation_with_gpt-4o.py
new file mode 100644
index 000000000..470f1de32
--- /dev/null
+++ b/examples/conversation_with_gpt-4o/conversation_with_gpt-4o.py
@@ -0,0 +1,36 @@
+# -*- coding: utf-8 -*-
+"""An example for conversation with OpenAI vision models, especially for
+GPT-4o."""
+import agentscope
+from agentscope.agents import UserAgent, DialogAgent
+
+# Fill in your OpenAI API key
+YOUR_OPENAI_API_KEY = "xxx"
+
+model_config = {
+ "config_name": "gpt-4o_config",
+ "model_type": "openai_chat",
+ "model_name": "gpt-4o",
+ "api_key": YOUR_OPENAI_API_KEY,
+ "generate_args": {
+ "temperature": 0.7,
+ },
+}
+
+agentscope.init(model_configs=model_config)
+
+# Require user to input URL, and press enter to skip the URL input
+user = UserAgent("user", require_url=True)
+
+agent = DialogAgent(
+ "Friday",
+ sys_prompt="You're a helpful assistant named Friday.",
+ model_config_name="gpt-4o_config",
+)
+
+x = None
+while True:
+ x = agent(x)
+ x = user(x)
+ if x.content == "exit": # type "exit" to break the loop
+ break
diff --git a/examples/distributed_simulation/run_simlation.sh b/examples/distributed_simulation/run_simulation.sh
similarity index 100%
rename from examples/distributed_simulation/run_simlation.sh
rename to examples/distributed_simulation/run_simulation.sh
diff --git a/examples/game_werewolf/prompt.py b/examples/game_werewolf/prompt.py
index c36291973..6f2c476e5 100644
--- a/examples/game_werewolf/prompt.py
+++ b/examples/game_werewolf/prompt.py
@@ -1,5 +1,6 @@
# -*- coding: utf-8 -*-
"""Used to record prompts, will be replaced by configuration"""
+from agentscope.parsers.json_object_parser import MarkdownJsonDictParser
class Prompts:
@@ -7,56 +8,83 @@ class Prompts:
to_wolves = (
"{}, if you are the only werewolf, eliminate a player. Otherwise, "
- "discuss with your teammates and reach an agreement. Respond in the "
- "following format which can be loaded by python json.loads()\n"
- "{{\n"
- ' "thought": "thought",\n'
- ' "speak": "thoughts summary to say to others",\n'
- ' "agreement": "whether the discussion reached an agreement or '
- 'not(true/false)"\n'
- "}}"
+ "discuss with your teammates and reach an agreement."
)
- to_wolves_vote = (
- "Which player do you vote to kill? Respond in the following format "
- "which can be loaded by python json.loads()\n"
- "{{\n"
- ' "thought": "thought" ,\n'
- ' "speak": "player_name"\n'
- "}}"
+ wolves_discuss_parser = MarkdownJsonDictParser(
+ content_hint={
+ "thought": "what you thought",
+ "speak": "what you speak",
+ "finish_discussion": "whether the discussion reached an "
+ "agreement or not (true/false)",
+ },
+ required_keys=["thought", "speak", "finish_discussion"],
+ keys_to_memory="speak",
+ keys_to_content="speak",
+ keys_to_metadata=["finish_discussion"],
+ )
+
+ to_wolves_vote = "Which player do you vote to kill?"
+
+ wolves_vote_parser = MarkdownJsonDictParser(
+ content_hint={
+ "thought": "what you thought",
+ "speak": "player_name",
+ },
+ required_keys=["thought", "speak"],
+ keys_to_memory="speak",
+ keys_to_content="speak",
)
to_wolves_res = "The player with the most votes is {}."
to_witch_resurrect = (
"{witch_name}, you're the witch. Tonight {dead_name} is eliminated. "
- "Would you like to resurrect {dead_name}? Respond in the following "
- "format which can be loaded by python json.loads()\n"
- "{{\n"
- ' "thought": "thought",\n'
- ' "speak": "thoughts summary to say",\n'
- ' "resurrect": true/false\n'
- "}}"
+ "Would you like to resurrect {dead_name}?"
)
- to_witch_poison = (
- "Would you like to eliminate one player? Respond in the following "
- "json format which can be loaded by python json.loads()\n"
- "{{\n"
- ' "thought": "thought", \n'
- ' "speak": "thoughts summary to say",\n'
- ' "eliminate": ture/false\n'
- "}}"
+ to_witch_resurrect_no = "The witch has chosen not to resurrect the player."
+ to_witch_resurrect_yes = "The witch has chosen to resurrect the player."
+
+ witch_resurrect_parser = MarkdownJsonDictParser(
+ content_hint={
+ "thought": "what you thought",
+ "speak": "whether to resurrect the player and the reason",
+ "resurrect": "whether to resurrect the player or not (true/false)",
+ },
+ required_keys=["thought", "speak", "resurrect"],
+ keys_to_memory="speak",
+ keys_to_content="speak",
+ keys_to_metadata=["resurrect"],
+ )
+
+ to_witch_poison = "Would you like to eliminate one player?"
+
+ witch_poison_parser = MarkdownJsonDictParser(
+ content_hint={
+ "thought": "what you thought",
+ "speak": "what you speak",
+ "eliminate": "whether to eliminate a player or not (true/false)",
+ },
+ required_keys=["thought", "speak", "eliminate"],
+ keys_to_memory="speak",
+ keys_to_content="speak",
+ keys_to_metadata=["eliminate"],
)
to_seer = (
"{}, you're the seer. Which player in {} would you like to check "
- "tonight? Respond in the following json format which can be loaded "
- "by python json.loads()\n"
- "{{\n"
- ' "thought": "thought" ,\n'
- ' "speak": "player_name"\n'
- "}}"
+ "tonight?"
+ )
+
+ seer_parser = MarkdownJsonDictParser(
+ content_hint={
+ "thought": "what you thought",
+ "speak": "player_name",
+ },
+ required_keys=["thought", "speak"],
+ keys_to_memory="speak",
+ keys_to_content="speak",
)
to_seer_result = "Okay, the role of {} is a {}."
@@ -76,26 +104,34 @@ class Prompts:
"based on the "
"situation and the information you gain, to vote a player eliminated "
"among alive players and to win the game, what do you want to say "
- "to others? You can decide whether to reveal your role. Respond in "
- "the following JSON format which can be loaded by python json.loads("
- ")\n"
- "{{\n"
- ' "thought": "thought" ,\n'
- ' "speak": "thought summary to say to others"\n'
- "}}"
+ "to others? You can decide whether to reveal your role. "
+ )
+
+ survivors_discuss_parser = MarkdownJsonDictParser(
+ content_hint={
+ "thought": "what you thought",
+ "speak": "what you speak",
+ },
+ required_keys=["thought", "speak"],
+ keys_to_memory="speak",
+ keys_to_content="speak",
+ )
+
+ survivors_vote_parser = MarkdownJsonDictParser(
+ content_hint={
+ "thought": "what you thought",
+ "speak": "player_name",
+ },
+ required_keys=["thought", "speak"],
+ keys_to_memory="speak",
+ keys_to_content="speak",
)
to_all_vote = (
- "Now the alive players are {}. Given the game rules and your role, "
- "based on the situation and the information you gain, to win the "
- "game, it's time to vote one player eliminated among the alive "
- "players, please cast your vote on who you believe is a werewolf. "
- "Respond in the following format which can be loaded by python "
- "json.loads()\n"
- "{{\n"
- ' "thought": "thought",\n'
- ' "speak": "player_name"\n'
- "}}"
+ "Given the game rules and your role, based on the situation and the"
+ " information you gain, to win the game, it's time to vote one player"
+ " eliminated among the alive players. Which player do you vote to "
+ "kill?"
)
to_all_res = "{} has been voted out."
diff --git a/examples/game_werewolf/werewolf.py b/examples/game_werewolf/werewolf.py
index aaf360fd8..e88217906 100644
--- a/examples/game_werewolf/werewolf.py
+++ b/examples/game_werewolf/werewolf.py
@@ -9,6 +9,7 @@
majority_vote,
extract_name_and_id,
n2s,
+ set_parsers,
)
from agentscope.message import Msg
from agentscope.msghub import msghub
@@ -29,6 +30,7 @@ def main() -> None:
model_configs="./configs/model_configs.json",
agent_configs="./configs/agent_configs.json",
)
+
roles = ["werewolf", "werewolf", "villager", "villager", "seer", "witch"]
wolves, witch, seer = survivors[:2], survivors[-1], survivors[-2]
@@ -37,11 +39,13 @@ def main() -> None:
# night phase, werewolves discuss
hint = HostMsg(content=Prompts.to_wolves.format(n2s(wolves)))
with msghub(wolves, announcement=hint) as hub:
+ set_parsers(wolves, Prompts.wolves_discuss_parser)
for _ in range(MAX_WEREWOLF_DISCUSSION_ROUND):
x = sequentialpipeline(wolves)
- if x.get("agreement", False):
+ if x.metadata.get("finish_discussion", False):
break
+ set_parsers(wolves, Prompts.wolves_vote_parser)
# werewolves vote
hint = HostMsg(content=Prompts.to_wolves_vote)
votes = [
@@ -65,14 +69,19 @@ def main() -> None:
},
),
)
- if witch(hint).get("resurrect", False):
+ set_parsers(witch, Prompts.witch_resurrect_parser)
+ if witch(hint).metadata.get("recurrent", False):
healing_used_tonight = True
dead_player.pop()
healing = False
+ HostMsg(content=Prompts.to_witch_resurrect_yes)
+ else:
+ HostMsg(content=Prompts.to_witch_resurrect_no)
if poison and not healing_used_tonight:
+ set_parsers(witch, Prompts.witch_poison_parser)
x = witch(HostMsg(content=Prompts.to_witch_poison))
- if x.get("eliminate", False):
+ if x.metadata.get("eliminate", False):
dead_player.append(extract_name_and_id(x.content)[0])
poison = False
@@ -81,6 +90,7 @@ def main() -> None:
hint = HostMsg(
content=Prompts.to_seer.format(seer.name, n2s(survivors)),
)
+ set_parsers(seer, Prompts.seer_parser)
x = seer(hint)
player, idx = extract_name_and_id(x.content)
@@ -108,8 +118,10 @@ def main() -> None:
]
with msghub(survivors, announcement=hints) as hub:
# discuss
+ set_parsers(survivors, Prompts.survivors_discuss_parser)
x = sequentialpipeline(survivors)
+ set_parsers(survivors, Prompts.survivors_vote_parser)
# vote
hint = HostMsg(content=Prompts.to_all_vote.format(n2s(survivors)))
votes = [
diff --git a/examples/game_werewolf/werewolf_utils.py b/examples/game_werewolf/werewolf_utils.py
index f4301bf44..c0e199ca6 100644
--- a/examples/game_werewolf/werewolf_utils.py
+++ b/examples/game_werewolf/werewolf_utils.py
@@ -65,3 +65,14 @@ def _get_name(agent_: Union[AgentBase, str]) -> str:
+ " and "
+ _get_name(agents[-1])
)
+
+
+def set_parsers(
+ agents: Union[AgentBase, list[AgentBase]],
+ parser_name: str,
+) -> None:
+ """Add parser to agents"""
+ if not isinstance(agents, list):
+ agents = [agents]
+ for agent in agents:
+ agent.set_parser(parser_name)
diff --git a/examples/model_configs_template/openai_chat_template.json b/examples/model_configs_template/openai_chat_template.json
index 8d3f78087..f5abccf00 100644
--- a/examples/model_configs_template/openai_chat_template.json
+++ b/examples/model_configs_template/openai_chat_template.json
@@ -1,25 +1,38 @@
-[{
- "config_name": "openai_chat_gpt-4",
- "model_type": "openai_chat",
- "model_name": "gpt-4",
- "api_key": "{your_api_key}",
- "client_args": {
- "max_retries": 3
+[
+ {
+ "config_name": "openai_chat_gpt-4",
+ "model_type": "openai_chat",
+ "model_name": "gpt-4",
+ "api_key": "{your_api_key}",
+ "client_args": {
+ "max_retries": 3
+ },
+ "generate_args": {
+ "temperature": 0.7
+ }
},
- "generate_args": {
- "temperature": 0.7
- }
-},
-{
- "config_name": "openai_chat_gpt-3.5-turbo",
- "model_type": "openai_chat",
- "model_name": "gpt-3.5-turbo",
- "api_key": "{your_api_key}",
- "client_args": {
- "max_retries": 3
+ {
+ "config_name": "openai_chat_gpt-3.5-turbo",
+ "model_type": "openai_chat",
+ "model_name": "gpt-3.5-turbo",
+ "api_key": "{your_api_key}",
+ "client_args": {
+ "max_retries": 3
+ },
+ "generate_args": {
+ "temperature": 0.7
+ }
},
- "generate_args": {
- "temperature": 0.7
+ {
+ "config_name": "openai_chat_gpt-4o",
+ "model_type": "openai_chat",
+ "model_name": "gpt-4o",
+ "api_key": "{your_api_key}",
+ "client_args": {
+ "max_retries": 3
+ },
+ "generate_args": {
+ "temperature": 0.7
+ }
}
-}
]
\ No newline at end of file
diff --git a/examples/swe_agent/main.ipynb b/examples/swe_agent/main.ipynb
new file mode 100644
index 000000000..59d15bcf3
--- /dev/null
+++ b/examples/swe_agent/main.ipynb
@@ -0,0 +1,285 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# Conversation with Software Engineering Agent\n",
+ "\n",
+ "SWE-agent(SoftWare Engineering Agent) is an agent designed for solving real world software engineering problems, such as fixing github issues.\n",
+ "More details can be found in the project's [homepage](https://swe-agent.com/) and related [github repo](https://swe-agent.com/).\n",
+ "\n",
+ "In the example here, we partially implement the SWE-agent, and provide a simple example of how to use the implemented SWE-agent to fix a bug in a python file.\n",
+ "You should note that currently how to enable agents with stronger programming capabilities remains an open challenge, and the performance of the paritially implemented SWE-agent is not guaranteed.\n",
+ "\n",
+ "## Prerequisites\n",
+ "\n",
+ "- Follow [READMD.md](https://github.com/modelscope/agentscope) to install AgentScope. We require the lastest version, so you should build from source by running `pip install -e .` instead of intalling from pypi. \n",
+ "- Prepare a model configuration. AgentScope supports both local deployed model services (CPU or GPU) and third-party services. More details and example model configurations please refer to our [tutorial](https://modelscope.github.io/agentscope/en/tutorial/203-model.html).\n",
+ "- Understand the ServiceToolkit module and how to use it to pre-process the tool functions for LLMs. You can refer to the [ReAct agent example](../conversation_with_react_agent/main.ipynb) and you should also refer to the [tutorial](https://modelscope.github.io/agentscope/en/tutorial/204-service.html) for service functions.\n",
+ "\n",
+ "\n",
+ "## Note\n",
+ "\n",
+ "- The example is tested with the following models. For other models, you may need to adjust the prompt.\n",
+ " - gpt-4\n",
+ "- How to enable agents with stronger programming capabilities remains an open challenge, and the current implementations are not perfect. Please feel free to explore it yourself."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "YOUR_MODEL_CONFIGURATION_NAME = \"{YOUR_MODEL_CONFIGURATION_NAME}\"\n",
+ "\n",
+ "YOUR_MODEL_CONFIGURATION = {\n",
+ " \"model_type\": \"xxx\", \n",
+ " \"config_name\": YOUR_MODEL_CONFIGURATION_NAME\n",
+ " \n",
+ " # ...\n",
+ "}"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Step 1: Initialize the AgentScope environment and SWE-agent\n",
+ "\n",
+ "Here we init the agentscope environment and load the SWE-agent.\n",
+ "\n",
+ "The code of SWE-agent is in `swe_agent.py`, and the related prompts are in `swe_agent_prompts.py`.\n",
+ "\n",
+ "If you are interested in the details, please refer to the code and the origianl SWE-agent repo [here](https://github.com/princeton-nlp/SWE-agent)."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from swe_agent import SWEAgent\n",
+ "\n",
+ "import agentscope\n",
+ "\n",
+ "agentscope.init(model_configs=YOUR_MODEL_CONFIGURATION)\n",
+ "\n",
+ "agent = SWEAgent(\n",
+ " name=\"assistant\",\n",
+ " model_config_name=YOUR_MODEL_CONFIGURATION_NAME,\n",
+ ")\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Step 2: Create the code to be processed by the SWE-agent\n",
+ "\n",
+ "Here we use the `write_file` function to write the following code into `gcd.py`.\n",
+ "The code here is a wrong implementation of the [Greatest Common Divisor (GCD) algorithm](https://en.wikipedia.org/wiki/Euclidean_algorithm).\n",
+ "We will ask the SWE-agent to correct it in our next step."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "{'status': ,\n",
+ " 'content': 'WRITE OPERATION:\\nYou have written to \"gcd.py\" on these lines: 0:-1.\\ndef gcd(a, b):\\n if a == 0:\\n return b\\n while a != 0:\\n a, b = b, a\\n return b\\n\\ndef lcm(a, b):\\n return (a * b) // gcd(a, b)\\n\\n# testing on GCD and LCM functions\\nprint(\"GCD of 12 and 18 is:\", gcd(12, 18))\\nprint(\"LCM of 12 and 18 is:\", lcm(12, 18))\\n\\n'}"
+ ]
+ },
+ "execution_count": 3,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "from swe_agent_service_func import write_file\n",
+ "\n",
+ "# here the gcd code is written in a wrong way\n",
+ "wrong_gcd_text = \"\"\"\n",
+ "def gcd(a, b):\n",
+ " if a == 0:\n",
+ " return b\n",
+ " while a != 0:\n",
+ " a, b = b, a\n",
+ " return b\n",
+ "\n",
+ "def lcm(a, b):\n",
+ " return (a * b) // gcd(a, b)\n",
+ "\n",
+ "# testing on GCD and LCM functions\n",
+ "print(\"GCD of 12 and 18 is:\", gcd(12, 18))\n",
+ "print(\"LCM of 12 and 18 is:\", lcm(12, 18))\n",
+ "\"\"\"\n",
+ "\n",
+ "write_file(\"gcd.py\", wrong_gcd_text)\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Step 3: Ask the SWE-agent to correct the code\n",
+ "\n",
+ "Here we simple ask the SWE-agent to correct the code in `gcd.py`."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 4,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "user: The file in gcd.py is not returning the correct answer, can you fix the file for me?\n",
+ "assistant: {\n",
+ " \"thought\": \"First thing I'll do is read the content of gcd.py file to understand the existing logic.\",\n",
+ " \"action\": {\n",
+ " \"name\": \"read_file\",\n",
+ " \"arguments\": {\n",
+ " \"file_path\": \"gcd.py\"\n",
+ " }\n",
+ " }\n",
+ "}\n",
+ "assistant: \n",
+ "====Observation====\n",
+ "\n",
+ "def gcd(a, b):\n",
+ " if a == 0:\n",
+ " return b\n",
+ " while a != 0:\n",
+ " a, b = b, a\n",
+ " return b\n",
+ "\n",
+ "def lcm(a, b):\n",
+ " return (a * b) // gcd(a, b)\n",
+ "\n",
+ "# testing on GCD and LCM functions\n",
+ "print(\"GCD of 12 and 18 is:\", gcd(12, 18))\n",
+ "print(\"LCM of 12 and 18 is:\", lcm(12, 18))\n",
+ "\n",
+ "\n",
+ "assistant: {\n",
+ " \"thought\": \"I can see the error in the gcd calculation. The gcd calculation in the file is incorrect. The correct way to find the gcd of two numbers using the Euclidean algorithm is to continually reduce a and b by finding the remainder of a % b until b is 0, at which point a will be the gcd of the original a and b values. In the given code, instead of finding the remainder, it is incorrectly swapping the values of a and b. I'll correct this error by modifying the gcd function using the write_file command, reflecting the correct Euclidean algorithm.\",\n",
+ " \"action\": {\n",
+ " \"name\": \"write_file\",\n",
+ " \"arguments\": {\n",
+ " \"start_line\": 0,\n",
+ " \"end_line\": 15,\n",
+ " \"content\": \"def gcd(a, b):\\n while b != 0:\\n a, b = b, a % b\\n return a\\n\\n\\ndef lcm(a, b):\\n return (a * b) // gcd(a, b)\\n\\n\\n# testing on GCD and LCM functions\\nprint(\\\"GCD of 12 and 18 is:\\\", gcd(12, 18))\\nprint(\\\"LCM of 12 and 18 is:\\\", lcm(12, 18))\\n\",\n",
+ " \"file_path\": \"gcd.py\"\n",
+ " }\n",
+ " }\n",
+ "}\n",
+ "assistant: \n",
+ "====Observation====\n",
+ "WRITE OPERATION:\n",
+ "You have written to \"gcd.py\" on these lines: 0:15.def gcd(a, b):\n",
+ " while b != 0:\n",
+ " a, b = b, a % b\n",
+ " return a\n",
+ "\n",
+ "\n",
+ "def lcm(a, b):\n",
+ " return (a * b) // gcd(a, b)\n",
+ "\n",
+ "\n",
+ "# testing on GCD and LCM functions\n",
+ "print(\"GCD of 12 and 18 is:\", gcd(12, 18))\n",
+ "print(\"LCM of 12 and 18 is:\", lcm(12, 18))\n",
+ "\n",
+ "\n",
+ "assistant: {\n",
+ " \"thought\": \"I should run the python file to see if the changes I made have fixed the gcd function.\",\n",
+ " \"action\": {\n",
+ " \"name\": \"execute_shell_command\",\n",
+ " \"arguments\": {\n",
+ " \"command\": \"python3 gcd.py\"\n",
+ " }\n",
+ " }\n",
+ "}\n",
+ "assistant: \n",
+ "====Observation====\n",
+ "GCD of 12 and 18 is: 6\n",
+ "LCM of 12 and 18 is: 36\n",
+ "assistant: {\n",
+ " \"thought\": \"The gcd.py file has been fixed and tested successfully. Therefore, no further actions are needed.\",\n",
+ " \"action\": {\n",
+ " \"name\": \"exit\",\n",
+ " \"arguments\": {}\n",
+ " }\n",
+ "}\n",
+ "assistant: \n",
+ "====Observation====\n",
+ "Current task finished, exitting.\n"
+ ]
+ }
+ ],
+ "source": [
+ "from loguru import logger\n",
+ "from agentscope.message import Msg\n",
+ "\n",
+ "mss = Msg(\n",
+ " name=\"user\", \n",
+ " content=\"The file in gcd.py is not returning the correct answer, can you fix the file for me?\", \n",
+ " role=\"user\"\n",
+ ")\n",
+ "logger.chat(mss)\n",
+ "answer_mss = agent(mss)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Conlusion\n",
+ "\n",
+ "After a few iterations, the SWE-agent assistant finish the job successfully, and the code is now working fine."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Above we shown a example of how to use the SWE-agent to fix code errors.\n",
+ "Although the design of the SWE-agent is primarily aimed at addressing GitHub issues, with modifications, it can also be utilized for more general programming tasks.\n",
+ "\n",
+ "Currently, how to enable agent with general programming ablities remains a challenging open question, with the efficacy of agent programming potentially influenced by factors such as prompt construction, model capabilities, and the complexity of the task at hand. Here we just provide an interesting toy example. \n",
+ "\n",
+ "We encourage users to experiment by altering the prompts within this example or by assigning different tasks to the agent, among other methods of exploration. Please feel free to experiment and explore on your own. The AgentScope team will continue to provide updates, enhancing the capabilities of the Programming Agents in the future!"
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "datajuicer",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.9.18"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/examples/swe_agent/swe_agent.py b/examples/swe_agent/swe_agent.py
new file mode 100644
index 000000000..3b55431d5
--- /dev/null
+++ b/examples/swe_agent/swe_agent.py
@@ -0,0 +1,274 @@
+# -*- coding: utf-8 -*-
+"""An agent class that partially implements the SWE-agent.
+SWE-agent is an agent designed for solving github issues.
+More details can be found in https://swe-agent.com/.
+
+Here we partially implement and modified the SWE-agent,
+try to make it work with wider range of tasks then just fixing github issues.
+"""
+
+from agentscope.agents import AgentBase
+from agentscope.message import Msg
+from agentscope.exception import ResponseParsingError
+from agentscope.parsers import MarkdownJsonDictParser
+from typing import List, Callable
+import json
+from agentscope.service import (
+ ServiceFactory,
+ execute_shell_command,
+)
+
+from swe_agent_service_func import (
+ exec_py_linting,
+ write_file,
+ read_file,
+)
+
+from swe_agent_prompts import (
+ get_system_prompt,
+ get_context_prompt,
+ get_step_prompt,
+)
+
+
+def prepare_func_prompt(function: Callable) -> str:
+ func, desc = ServiceFactory.get(function)
+ func_name = desc["function"]["name"]
+ func_desc = desc["function"]["description"]
+ args_desc = desc["function"]["parameters"]["properties"]
+
+ args_list = [f"{func_name}: {func_desc}"]
+ for args_name, args_info in args_desc.items():
+ if "type" in args_info:
+ args_line = (
+ f'\t{args_name} ({args_info["type"]}): '
+ f'{args_info.get("description", "")}'
+ )
+ else:
+ args_line = f'\t{args_name}: {args_info.get("description", "")}'
+ args_list.append(args_line)
+
+ func_prompt = "\n".join(args_list)
+ return func_prompt
+
+
+COMMANDS_DISCRIPTION_DICT = {
+ "exit": "exit: Executed when the current task is complete, takes no arguments", # noqa
+ "scroll_up": "scroll_up: Scrolls up the current open file, will scroll up and show you the 100 lines above your current lines, takes no arguments", # noqa
+ "scroll_down": "scroll_down: Scrolls down the current open file, will scroll down and show you the 100 lines below your current lines'takes no arguments", # noqa
+ "goto": "goto: This will take you directly to the line and show you the 100 lines below it. \n line_num (int): The line number to go to.", # noqa
+}
+
+COMMANDS_DISCRIPTION_DICT["write_file"] = prepare_func_prompt(write_file)
+COMMANDS_DISCRIPTION_DICT["read_file"] = prepare_func_prompt(read_file)
+COMMANDS_DISCRIPTION_DICT["execute_shell_command"] = prepare_func_prompt(
+ execute_shell_command,
+)
+COMMANDS_DISCRIPTION_DICT["exec_py_linting"] = prepare_func_prompt(
+ exec_py_linting,
+)
+
+
+ERROR_INFO_PROMPT = """Your response is not a JSON object, and cannot be parsed by `json.loads` in parse function:
+## Your Response:
+[YOUR RESPONSE BEGIN]
+{response}
+[YOUR RESPONSE END]
+
+## Error Information:
+{error_info}
+
+Analyze the reason, and re-correct your response in the correct format.""" # pylint: disable=all # noqa
+
+
+def count_file_lines(file_path: str) -> int:
+ with open(file_path, "r") as file:
+ lines = file.readlines()
+ return len(lines)
+
+
+class SWEAgent(AgentBase):
+ """
+ The SWE-agent
+ """
+
+ def __init__(
+ self,
+ name: str,
+ model_config_name: str,
+ ) -> None:
+ """ """
+ super().__init__(
+ name=name,
+ model_config_name=model_config_name,
+ )
+
+ self.memory_window = 6
+ self.max_retries = 2
+ self.running_memory: List[str] = []
+ self.cur_file: str = ""
+ self.cur_line: int = 0
+ self.cur_file_content: str = ""
+
+ self.main_goal = ""
+ self.commands_prompt = ""
+ self.parser = MarkdownJsonDictParser()
+ self.get_commands_prompt()
+
+ def get_current_file_content(self) -> None:
+ """
+ Get the current file content.
+ """
+ if self.cur_file == "":
+ return
+ start_line = self.cur_line - 50
+ if start_line < 0:
+ start_line = 0
+ end_line = self.cur_line + 50
+ if end_line > count_file_lines(self.cur_file):
+ end_line = -1
+ read_res = read_file(self.cur_file, start_line, end_line)
+ self.cur_file_content = read_res.content
+
+ def step(self) -> Msg:
+ """
+ Step the SWE-agent.
+ """
+ message_list = []
+
+ # construct system prompt
+ system_prompt = get_system_prompt(self.commands_prompt)
+ message_list.append(Msg("user", system_prompt, role="system"))
+
+ # construct context prompt, i.e. previous actions
+ context_prompt = get_context_prompt(
+ self.running_memory,
+ self.memory_window,
+ )
+ message_list.append(Msg("user", context_prompt, role="user"))
+
+ # construct step prompt for this instance
+ self.get_current_file_content()
+ step_prompt = get_step_prompt(
+ self.main_goal,
+ self.cur_file,
+ self.cur_line,
+ self.cur_file_content,
+ )
+ message_list.append(Msg("user", step_prompt, role="user"))
+
+ # get response from agent
+ try:
+ in_prompt = self.model.format(message_list)
+ res = self.model(
+ in_prompt,
+ parse_func=self.parser.parse,
+ max_retries=1,
+ )
+
+ except ResponseParsingError as e:
+ response_msg = Msg(self.name, e.raw_response, "assistant")
+ self.speak(response_msg)
+
+ # Re-correct by model itself
+ error_msg = Msg(
+ name="system",
+ content={
+ "action": {"name": "error"},
+ "error_msg": ERROR_INFO_PROMPT.format(
+ parse_func=self.parser.parse,
+ error_info=e.message,
+ response=e.raw_response,
+ ),
+ },
+ role="system",
+ )
+ self.speak(error_msg)
+ # continue
+ self.running_memory.append(error_msg)
+ return error_msg
+
+ msg_res = Msg(self.name, res.parsed, role="assistant")
+
+ self.speak(
+ Msg(self.name, json.dumps(res.parsed, indent=4), role="assistant"),
+ )
+
+ # parse and execute action
+ action = res.parsed.get("action")
+
+ obs = self.prase_command(res.parsed["action"])
+ self.speak(
+ Msg(self.name, "\n====Observation====\n" + obs, role="assistant"),
+ )
+
+ # add msg to context windows
+ self.running_memory.append(str(action) + str(obs))
+ return msg_res
+
+ def reply(self, x: dict = None) -> dict:
+ action_name = None
+ self.main_goal = x.content
+ while not action_name == "exit":
+ msg = self.step()
+ action_name = msg.content["action"]["name"]
+ return msg
+
+ def prase_command(self, command_call: dict) -> str:
+ command_name = command_call["name"]
+ command_args = command_call["arguments"]
+ if command_name == "exit":
+ return "Current task finished, exitting."
+ if command_name in ["goto", "scroll_up", "scroll_down"]:
+ if command_name == "goto":
+ line = command_call["arguments"]["line_num"]
+ command_str = f"Going to {self.cur_file} line \
+ {command_args['line_mum']}."
+ command_failed_str = f"Failed to go to {self.cur_file} \
+ line {command_args['line_num']}"
+ if command_name == "scroll_up":
+ line = self.cur_line - 100
+ if line < 0:
+ line = 0
+ command_str = (
+ f"Scrolling up from file {self.cur_file} to line {line}."
+ )
+ command_failed_str = (
+ f"Failed to scroll up {self.cur_file} to line {line}"
+ )
+ if command_name == "scroll_down":
+ line = self.cur_line + 100
+ if line > count_file_lines(self.cur_file):
+ line = count_file_lines(self.cur_file)
+ command_str = (
+ f"Scrolling down from file {self.cur_file} to line {line}."
+ )
+ command_failed_str = (
+ f"Failed to scrool down {self.cur_file} to line {line}"
+ )
+ read_status = read_file(self.cur_file, line, line + 100)
+ if read_status.status == "success":
+ self.cur_line = line
+ obs = read_status.content
+ return f"{command_str}. Observe file content: {obs}"
+ else:
+ return command_failed_str
+ if command_name == "execute_shell_command":
+ return execute_shell_command(**command_args).content
+ if command_name == "write_file":
+ self.cur_file = command_args["file_path"]
+ self.cur_line = command_args.get("start_line", 0)
+ write_status = write_file(**command_args)
+ return write_status.content
+ if command_name == "read_file":
+ self.cur_file = command_args["file_path"]
+ self.cur_line = command_args.get("start_line", 0)
+ read_status = read_file(**command_args)
+ return read_status.content
+ if command_name == "exec_py_linting":
+ return exec_py_linting(**command_args).content
+ return "No such command"
+
+ def get_commands_prompt(self) -> None:
+ for name, desc in COMMANDS_DISCRIPTION_DICT.items():
+ self.commands_prompt += f"{name}: {desc}\n"
diff --git a/examples/swe_agent/swe_agent_prompts.py b/examples/swe_agent/swe_agent_prompts.py
new file mode 100644
index 000000000..4c30e48af
--- /dev/null
+++ b/examples/swe_agent/swe_agent_prompts.py
@@ -0,0 +1,123 @@
+# -*- coding: utf-8 -*-
+# pylint: disable=C0301
+"""The SWE-agent relay heavily on it's prompts.
+This file contains the neccessary prompts for the SWE-agent.
+Some prompts are taken and modified from the original SWE-agent repo
+or the SWE-agent implementation from Open-Devin.
+"""
+
+WINDOW = 100
+
+
+def get_system_prompt(command_prompt: str) -> str:
+ """
+ Get the system prompt for SWE-agent.
+ """
+ return f"""
+ SETTING:
+ You are an autonomous coding agent, here to perform codding tasks given the instruction.
+ You have been designed with a wide range of programming tasks, from code editing and debugging to testing and deployment.
+ You have access to a variety of tools and commands that you can use to help you solve problems efficiently.
+
+ You're working directly in the command line with a special interface.
+
+ The special interface consists of a file editor that shows you {WINDOW} lines of a file at a time.
+ In addition to typical bash commands, you can also use the following commands to help you navigate and edit files.
+
+ COMMANDS:
+ {command_prompt}
+
+ Please note that THE WRITE COMMAND REQUIRES PROPER INDENTATION.
+ If you'd like to add the line ' print(x)' you must fully write that out, with all those spaces before the code!
+ Indentation is important and code that is not indented correctly will fail and require fixing before it can be run.
+
+ If you'd like to issue two commands at once, PLEASE DO NOT DO THAT! Please instead first submit just the first command, and then after receiving a response you'll be able to issue the second command.
+ You're free to use any other bash commands you want (e.g. find, grep, cat, ls) in addition to the special commands listed above.
+
+ However, the environment does NOT support interactive session commands (e.g. vim, python), so please do not invoke them.
+
+ {RESPONSE_FORMAT_PROMPT}
+
+ """ # noqa
+
+
+RESPONSE_FORMAT_PROMPT = """
+## Response Format:
+You should respond with a JSON object in the following format.
+```json
+{
+ "thought": "what you thought",
+ "action": {"name": "{command name}", "arguments": {"{argument1 name}": xxx, "{argument2 name}": xxx}}
+}
+```
+
+For Example:
+```json
+{
+ "thought": "First I'll start by using ls to see what files are in the current directory. Then maybe we can look at some relevant files to see what they look like.",
+ "action": {"name": "execute_shell_command", "arguments": {"command": "ls -a"}}
+}
+```
+OUTPUT the JSON format and ONLY OUTPUT the JSON format.
+Your Response should always be a valid JSON string that can be parsed.
+""" # noqa
+
+
+def get_step_prompt(
+ task: str,
+ file: str,
+ line: int,
+ current_file_content: str,
+) -> str:
+ """
+ Get the step prompt for SWE-agent.
+ """
+ return f"""
+ We're currently perform the following coding task. Here's the original task description from the user.
+ {task}
+
+ CURRENT
+ Open File: {file} on line {line}
+
+ Current File Content:
+ {current_file_content}
+
+ You can use these commands with the current file:
+ Navigation: `scroll_up`, `scroll_down`, and `goto `
+
+
+ INSTRUCTIONS:
+
+ 1. If you run a command and it doesn't work, try running a different command. A command that did not work once will not work the second time unless you modify it!
+
+ 2. If you open a file and need to get to an area around a specific line that is not in the first 100 lines, say line 583, don't just use the scroll_down command multiple times. Instead, use the goto 583 command. It's much quicker.
+
+ 3. Always make sure to look at the currently open file and the current working directory (which appears right after the currently open file). The currently open file might be in a different directory! Note that some commands, such as 'write_file' and 'read_file', open files, so they might change the current open file.
+
+ 4. When editing files, it is easy to accidentally specify a wrong line number or to write code with incorrect indentation. Always check the code after you issue an edit to make sure that it reflects what you wanted to accomplish. If it didn't, issue another command to fix it.
+
+ 5. After modifying python files, you can run `exec_py_linting` to check for errors. If there are errors, fix them and repeat the previous step.
+
+ NOTE THAT THIS ENVIRONMENT DOES NOT SUPPORT INTERACTIVE SESSION COMMANDS, such as "vim" or "python", or "python3". So DONOT execute them by running `execute_shell_command` with `python` command or `python3` command if the code need additional inputs.
+ If you want to check whether a python file is valid, you can use `exec_py_linting` to check for errors.
+
+ You should always notice your response format and respond with a JSON object in the following format.
+ {RESPONSE_FORMAT_PROMPT}
+""" # noqa
+
+
+def get_context_prompt(memory: list, window: int) -> str:
+ """
+ Get the context prompt for the given memory and window.
+ """
+ res = f"These are your past {window} actions:\n"
+ window_size = window if len(memory) > window else len(memory)
+ cur_mems = memory[-window_size:]
+ res += "===== Previous Actions =====\n"
+ for idx, mem in enumerate(cur_mems):
+ res += f"\nMemory {idx}:\n{mem}\n"
+ res += "======= End Actions =======\n"
+ res += "Use these memories to provide additional context to \
+ the problem you are solving.\nRemember that you have already \
+ completed these steps so you do not need to perform them again."
+ return res
diff --git a/examples/swe_agent/swe_agent_service_func.py b/examples/swe_agent/swe_agent_service_func.py
new file mode 100644
index 000000000..76bdaecce
--- /dev/null
+++ b/examples/swe_agent/swe_agent_service_func.py
@@ -0,0 +1,147 @@
+# -*- coding: utf-8 -*-
+# pylint: disable=C0301
+"""
+Tools for swe-agent, such as checking files with linting and formatting,
+writing and reading files by lines, etc.
+"""
+import subprocess
+import os
+
+from agentscope.service.service_response import ServiceResponse
+from agentscope.service.service_status import ServiceExecStatus
+
+
+def exec_py_linting(file_path: str) -> ServiceResponse:
+ """
+ Executes flake8 linting on the given .py file with specified checks and
+ returns the linting result.
+
+ Args:
+ file_path (`str`): The path to the Python file to lint.
+
+ Returns:
+ ServiceResponse: Contains either the output from the flake8 command as
+ a string if successful, or an error message including the error type.
+ """
+ command = f"flake8 --isolated --select=F821,F822,F831,\
+ E111,E112,E113,E999,E902 {file_path}"
+
+ try:
+ result = subprocess.run(
+ command,
+ shell=True,
+ check=True,
+ stdout=subprocess.PIPE,
+ stderr=subprocess.PIPE,
+ text=True,
+ )
+ return ServiceResponse(
+ status=ServiceExecStatus.SUCCESS,
+ content=result.stdout.strip()
+ if result.stdout
+ else "No lint errors found.",
+ )
+ except subprocess.CalledProcessError as e:
+ error_message = (
+ e.stderr.strip()
+ if e.stderr
+ else "An error occurred while linting the file."
+ )
+ return ServiceResponse(
+ status=ServiceExecStatus.ERROR,
+ content=error_message,
+ )
+ except Exception as e:
+ return ServiceResponse(
+ status=ServiceExecStatus.ERROR,
+ content=str(e),
+ )
+
+
+def write_file(
+ file_path: str,
+ content: str,
+ start_line: int = 0,
+ end_line: int = -1,
+) -> ServiceResponse:
+ """
+ Write content to a file by replacing the current lines between and with . Default start_line = 0 and end_line = -1. Calling this with no args will replace the whole file, so besure to use this with caution when writing to a file that already exists.
+
+ Args:
+ file_path (`str`): The path to the file to write to.
+ content (`str`): The content to write to the file.
+ start_line (`Optional[int]`, defaults to `0`): The start line of the file to be replace with .
+ end_line (`Optional[int]`, defaults to `-1`): The end line of the file to be replace with . end_line = -1 means the end of the file, otherwise it should be a positive integer indicating the line number.
+ """ # noqa
+ try:
+ mode = "w" if not os.path.exists(file_path) else "r+"
+ insert = content.split("\n")
+ with open(file_path, mode, encoding="utf-8") as file:
+ if mode != "w":
+ all_lines = file.readlines()
+ new_file = [""] if start_line == 0 else all_lines[:start_line]
+ new_file += [i + "\n" for i in insert]
+ last_line = end_line + 1
+ new_file += [""] if end_line == -1 else all_lines[last_line:]
+ else:
+ new_file = insert
+
+ file.seek(0)
+ file.writelines(new_file)
+ file.truncate()
+ obs = f'WRITE OPERATION:\nYou have written to "{file_path}" \
+ on these lines: {start_line}:{end_line}.'
+ return ServiceResponse(
+ status=ServiceExecStatus.SUCCESS,
+ content=obs + "".join(new_file),
+ )
+ except Exception as e:
+ error_message = f"{e.__class__.__name__}: {e}"
+ return ServiceResponse(
+ status=ServiceExecStatus.ERROR,
+ content=error_message,
+ )
+
+
+def read_file(
+ file_path: str,
+ start_line: int = 0,
+ end_line: int = -1,
+) -> ServiceResponse:
+ """
+ Shows a given file's contents starting from up to . Default: start_line = 0, end_line = -1. By default the whole file will be read.
+
+ Args:
+ file_path (`str`): The path to the file to read.
+ start_line (`Optional[int]`, defaults to `0`): The start line of the file to be read.
+ end_line (`Optional[int]`, defaults to `-1`): The end line of the file to be read.
+ """ # noqa
+ start_line = max(start_line, 0)
+ try:
+ with open(file_path, "r", encoding="utf-8") as file:
+ if end_line == -1:
+ if start_line == 0:
+ code_view = file.read()
+ else:
+ all_lines = file.readlines()
+ code_slice = all_lines[start_line:]
+ code_view = "".join(code_slice)
+ else:
+ all_lines = file.readlines()
+ num_lines = len(all_lines)
+ begin = max(0, min(start_line, num_lines - 2))
+ end_line = (
+ -1 if end_line > num_lines else max(begin + 1, end_line)
+ )
+ code_slice = all_lines[begin:end_line]
+ code_view = "".join(code_slice)
+ return ServiceResponse(
+ status=ServiceExecStatus.SUCCESS,
+ content=f"{code_view}",
+ )
+ except Exception as e:
+ error_message = f"{e.__class__.__name__}: {e}"
+ return ServiceResponse(
+ status=ServiceExecStatus.ERROR,
+ content=error_message,
+ )
diff --git a/setup.py b/setup.py
index 2259f592f..2f2a75c34 100644
--- a/setup.py
+++ b/setup.py
@@ -64,6 +64,7 @@
"Flask==3.0.0",
"Flask-Cors==4.0.0",
"Flask-SocketIO==5.3.6",
+ "flake8",
# TODO: move into other requires
"dashscope==1.14.1",
"openai>=1.3.0",
diff --git a/src/agentscope/agents/dict_dialog_agent.py b/src/agentscope/agents/dict_dialog_agent.py
index 0ee8061b3..eb16690e0 100644
--- a/src/agentscope/agents/dict_dialog_agent.py
+++ b/src/agentscope/agents/dict_dialog_agent.py
@@ -1,59 +1,18 @@
# -*- coding: utf-8 -*-
-"""A dict dialog agent that using `parse_func` and `fault_handler` to
-parse the model response."""
-import json
-from typing import Any, Optional, Callable
-from loguru import logger
+"""An agent that replies in a dictionary format."""
+from typing import Optional
from ..message import Msg
from .agent import AgentBase
-from ..models import ModelResponse
-from ..prompt import PromptType
-from ..utils.tools import _convert_to_str
-
-
-def parse_dict(response: ModelResponse) -> ModelResponse:
- """Parse function for DictDialogAgent"""
- try:
- if response.text is not None:
- response_dict = json.loads(response.text)
- else:
- raise ValueError(
- f"The text field of the response s None: {response}",
- )
- except json.decoder.JSONDecodeError:
- # Sometimes LLM may return a response with single quotes, which is not
- # a valid JSON format. We replace single quotes with double quotes and
- # try to load it again.
- # TODO: maybe using a more robust json library to handle this case
- response_dict = json.loads(response.text.replace("'", '"'))
-
- return ModelResponse(raw=response_dict)
-
-
-def default_response(response: ModelResponse) -> ModelResponse:
- """The default response of fault_handler"""
- return ModelResponse(raw={"speak": response.text})
+from ..parsers import ParserBase
class DictDialogAgent(AgentBase):
"""An agent that generates response in a dict format, where user can
- specify the required fields in the response via prompt, e.g.
-
- .. code-block:: python
+ specify the required fields in the response via specifying the parser
- prompt = "... Response in the following format that can be loaded by
- python json.loads()
- {
- "thought": "thought",
- "speak": "thoughts summary to say to others",
- # ...
- }"
-
- This agent class is an example for using `parse_func` and `fault_handler`
- to parse the output from the model, and handling the fault when parsing
- fails. We take "speak" as a required field in the response, and print
- the speak field as the output response.
+ About parser, please refer to our
+ [tutorial](https://modelscope.github.io/agentscope/en/tutorial/203-parser.html)
For usage example, please refer to the example of werewolf in
`examples/game_werewolf`"""
@@ -65,10 +24,7 @@ def __init__(
model_config_name: str,
use_memory: bool = True,
memory_config: Optional[dict] = None,
- parse_func: Optional[Callable[..., Any]] = parse_dict,
- fault_handler: Optional[Callable[..., Any]] = default_response,
max_retries: Optional[int] = 3,
- prompt_type: Optional[PromptType] = None,
) -> None:
"""Initialize the dict dialog agent.
@@ -85,19 +41,9 @@ def __init__(
Whether the agent has memory.
memory_config (`Optional[dict]`, defaults to `None`):
The config of memory.
- parse_func (`Optional[Callable[..., Any]]`, defaults to `parse_dict`):
- The function used to parse the model output,
- e.g. `json.loads`, which is used to extract json from the
- output.
- fault_handler (`Optional[Callable[..., Any]]`, defaults to `default_response`):
- The function used to handle the fault when parse_func fails
- to parse the model output.
max_retries (`Optional[int]`, defaults to `None`):
The maximum number of retries when failed to parse the model
output.
- prompt_type (`Optional[PromptType]`, defaults to `PromptType.LIST`):
- The type of the prompt organization, chosen from
- `PromptType.LIST` or `PromptType.STRING`.
""" # noqa
super().__init__(
name=name,
@@ -107,18 +53,17 @@ def __init__(
memory_config=memory_config,
)
- # record the func and handler for parsing and handling faults
- self.parse_func = parse_func
- self.fault_handler = fault_handler
+ self.parser = None
self.max_retries = max_retries
- if prompt_type is not None:
- logger.warning(
- "The argument `prompt_type` is deprecated and "
- "will be removed in the future.",
- )
+ def set_parser(self, parser: ParserBase) -> None:
+ """Set response parser, which will provide 1) format instruction; 2)
+ response parsing; 3) filtering fields when returning message, storing
+ message in memory. So developers only need to change the
+ parser, and the agent will work as expected.
+ """
+ self.parser = parser
- # TODO change typing from dict to MSG
def reply(self, x: dict = None) -> dict:
"""Reply function of the agent.
Processes the input data, generates a prompt using the current
@@ -151,42 +96,29 @@ def reply(self, x: dict = None) -> dict:
self.memory
and self.memory.get_memory()
or x, # type: ignore[arg-type]
+ Msg("system", self.parser.format_instruction, "system"),
)
# call llm
- response = self.model(
+ res = self.model(
prompt,
- parse_func=self.parse_func,
- fault_handler=self.fault_handler,
+ parse_func=self.parser.parse,
max_retries=self.max_retries,
- ).raw
-
- # logging raw messages in debug mode
- logger.debug(json.dumps(response, indent=4, ensure_ascii=False))
-
- # In this agent, if the response is a dict, we treat "speak" as a
- # special key, which represents the text to be spoken
- if isinstance(response, dict) and "speak" in response:
- msg = Msg(
- self.name,
- response["speak"],
- role="assistant",
- **response,
- )
- else:
- msg = Msg(self.name, response, role="assistant")
-
- # Print/speak the message in this agent's voice
- self.speak(msg)
+ )
- # record to memory
- if self.memory:
- # Convert the response dict into a string to store in memory
- msg_memory = Msg(
- name=self.name,
- content=_convert_to_str(response),
- role="assistant",
- )
- self.memory.add(msg_memory)
+ # Filter the parsed response by keys for storing in memory, returning
+ # in the reply function, and feeding into the metadata field in the
+ # returned message object.
+ self.memory.add(
+ Msg(self.name, self.parser.to_memory(res.parsed), "assistant"),
+ )
+
+ msg = Msg(
+ self.name,
+ content=self.parser.to_content(res.parsed),
+ role="assistant",
+ metadata=self.parser.to_metadata(res.parsed),
+ )
+ self.speak(msg)
return msg
diff --git a/src/agentscope/agents/react_agent.py b/src/agentscope/agents/react_agent.py
index 39b6a5d00..cdc81788b 100644
--- a/src/agentscope/agents/react_agent.py
+++ b/src/agentscope/agents/react_agent.py
@@ -136,6 +136,8 @@ def __init__(
"function": service_toolkit.tools_calling_format,
},
required_keys=["thought", "speak", "function"],
+ # Only print the speak field when verbose is False
+ keys_to_content=True if self.verbose else "speak",
)
def reply(self, x: dict = None) -> dict:
@@ -155,9 +157,8 @@ def reply(self, x: dict = None) -> dict:
"system",
self.parser.format_instruction,
role="system",
+ echo=self.verbose,
)
- if self.verbose:
- self.speak(hint_msg)
# Prepare prompt for the model
prompt = self.model.format(self.memory.get_memory(), hint_msg)
@@ -171,16 +172,21 @@ def reply(self, x: dict = None) -> dict:
)
# Record the response in memory
- msg_response = Msg(self.name, res.text, "assistant")
- self.memory.add(msg_response)
+ self.memory.add(
+ Msg(
+ self.name,
+ self.parser.to_memory(res.parsed),
+ "assistant",
+ ),
+ )
# Print out the response
- if self.verbose:
- self.speak(msg_response)
- else:
- self.speak(
- Msg(self.name, res.parsed["speak"], "assistant"),
- )
+ msg_returned = Msg(
+ self.name,
+ self.parser.to_content(res.parsed),
+ "assistant",
+ )
+ self.speak(msg_returned)
# Skip the next steps if no need to call tools
# The parsed field is a dictionary
@@ -192,7 +198,7 @@ def reply(self, x: dict = None) -> dict:
and len(arg_function) == 0
):
# Only the speak field is exposed to users or other agents
- return Msg(self.name, res.parsed["speak"], "assistant")
+ return msg_returned
# Only catch the response parsing error and expose runtime
# errors to developers for debugging
@@ -244,9 +250,8 @@ def reply(self, x: dict = None) -> dict:
"iterations. Now generate a reply by summarizing the current "
"situation.",
role="system",
+ echo=self.verbose,
)
- if self.verbose:
- self.speak(hint_msg)
# Generate a reply by summarizing the current situation
prompt = self.model.format(self.memory.get_memory(), hint_msg)
diff --git a/src/agentscope/agents/rpc_agent.py b/src/agentscope/agents/rpc_agent.py
index b7c3441bc..47d32ce3a 100644
--- a/src/agentscope/agents/rpc_agent.py
+++ b/src/agentscope/agents/rpc_agent.py
@@ -9,7 +9,7 @@
import base64
import traceback
import asyncio
-from typing import Any, Type, Optional, Union, Sequence
+from typing import Type, Optional, Union, Sequence
from concurrent import futures
from loguru import logger
@@ -18,11 +18,13 @@
import grpc
from grpc import ServicerContext
from expiringdict import ExpiringDict
-except ImportError:
- dill = None
- grpc = None
- ServicerContext = Any
- ExpiringDict = None
+except ImportError as import_error:
+ from agentscope.utils.tools import ImportErrorReporter
+
+ dill = ImportErrorReporter(import_error, "distribute")
+ grpc = ImportErrorReporter(import_error, "distribute")
+ ServicerContext = ImportErrorReporter(import_error, "distribute")
+ ExpiringDict = ImportErrorReporter(import_error, "distribute")
from agentscope._init import init_process, _INIT_SETTINGS
from agentscope.agents.agent import AgentBase
diff --git a/src/agentscope/agents/user_agent.py b/src/agentscope/agents/user_agent.py
index 38bd46de1..ee97a935f 100644
--- a/src/agentscope/agents/user_agent.py
+++ b/src/agentscope/agents/user_agent.py
@@ -81,7 +81,9 @@ def reply(
# Input url of file, image, video, audio or website
url = None
if self.require_url:
- url = input("URL: ")
+ url = input("URL (or Enter to skip): ")
+ if url == "":
+ url = None
# Add additional keys
msg = Msg(
diff --git a/src/agentscope/message.py b/src/agentscope/message.py
index 36cd2fabd..cd8a0a9a5 100644
--- a/src/agentscope/message.py
+++ b/src/agentscope/message.py
@@ -91,6 +91,28 @@ def serialize(self) -> str:
class Msg(MessageBase):
"""The Message class."""
+ id: str
+ """The id of the message."""
+
+ name: str
+ """The name of who send the message."""
+
+ content: Any
+ """The content of the message."""
+
+ role: Literal["system", "user", "assistant"]
+ """The role of the message sender."""
+
+ metadata: Optional[dict]
+ """Save the information for application's control flow, or other
+ purposes."""
+
+ url: Optional[Union[Sequence[str], str]]
+ """A url to file, image, video, audio or website."""
+
+ timestamp: str
+ """The timestamp of the message."""
+
def __init__(
self,
name: str,
@@ -99,6 +121,7 @@ def __init__(
url: Optional[Union[Sequence[str], str]] = None,
timestamp: Optional[str] = None,
echo: bool = False,
+ metadata: Optional[Union[dict, str]] = None,
**kwargs: Any,
) -> None:
"""Initialize the message object
@@ -117,6 +140,11 @@ def __init__(
timestamp (`Optional[str]`, defaults to `None`):
The timestamp of the message, if None, it will be set to
current time.
+ echo (`bool`, defaults to `False`):
+ Whether to print the message to the console.
+ metadata (`Optional[Union[dict, str]]`, defaults to `None`):
+ Save the information for application's control flow, or other
+ purposes.
**kwargs (`Any`):
Other attributes of the message.
"""
@@ -134,6 +162,7 @@ def __init__(
role=role or "assistant",
url=url,
timestamp=timestamp,
+ metadata=metadata,
**kwargs,
)
if echo:
diff --git a/src/agentscope/models/dashscope_model.py b/src/agentscope/models/dashscope_model.py
index 4fd380de3..c4183aa85 100644
--- a/src/agentscope/models/dashscope_model.py
+++ b/src/agentscope/models/dashscope_model.py
@@ -11,7 +11,7 @@
try:
import dashscope
-except ModuleNotFoundError:
+except ImportError:
dashscope = None
from .model import ModelWrapperBase, ModelResponse
diff --git a/src/agentscope/models/openai_model.py b/src/agentscope/models/openai_model.py
index 2f74e101d..99542582b 100644
--- a/src/agentscope/models/openai_model.py
+++ b/src/agentscope/models/openai_model.py
@@ -1,14 +1,14 @@
# -*- coding: utf-8 -*-
"""Model wrapper for OpenAI models"""
from abc import ABC
-from typing import Union, Any, List, Sequence
+from typing import Union, Any, List, Sequence, Dict
from loguru import logger
from .model import ModelWrapperBase, ModelResponse
from ..file_manager import file_manager
from ..message import MessageBase
-from ..utils.tools import _convert_to_str
+from ..utils.tools import _convert_to_str, _to_openai_image_url
try:
import openai
@@ -107,6 +107,9 @@ class OpenAIChatWrapper(OpenAIWrapperBase):
deprecated_model_type: str = "openai"
+ substrings_in_vision_models_names = ["gpt-4-turbo", "vision", "gpt-4o"]
+ """The substrings in the model names of vision models."""
+
def _register_default_metrics(self) -> None:
# Set monitor accordingly
# TODO: set quota to the following metrics
@@ -212,6 +215,77 @@ def __call__(
raw=response.model_dump(),
)
+ def _format_msg_with_url(
+ self,
+ msg: MessageBase,
+ ) -> Dict:
+ """Format a message with image urls into openai chat format.
+ This format method is used for gpt-4o, gpt-4-turbo, gpt-4-vision and
+ other vision models.
+ """
+ # Check if the model is a vision model
+ if not any(
+ _ in self.model_name
+ for _ in self.substrings_in_vision_models_names
+ ):
+ logger.warning(
+ f"The model {self.model_name} is not a vision model. "
+ f"Skip the url in the message.",
+ )
+ return {
+ "role": msg.role,
+ "name": msg.name,
+ "content": _convert_to_str(msg.content),
+ }
+
+ # Put all urls into a list
+ urls = [msg.url] if isinstance(msg.url, str) else msg.url
+
+ # Check if the url refers to an image
+ checked_urls = []
+ for url in urls:
+ try:
+ checked_urls.append(_to_openai_image_url(url))
+ except TypeError:
+ logger.warning(
+ f"The url {url} is not a valid image url for "
+ f"OpenAI Chat API, skipped.",
+ )
+
+ if len(checked_urls) == 0:
+ # If no valid image url is provided, return the normal message dict
+ return {
+ "role": msg.role,
+ "name": msg.name,
+ "content": _convert_to_str(msg.content),
+ }
+ else:
+ # otherwise, use the vision format message
+ returned_msg = {
+ "role": msg.role,
+ "name": msg.name,
+ "content": [
+ {
+ "type": "text",
+ "text": _convert_to_str(msg.content),
+ },
+ ],
+ }
+
+ image_dicts = [
+ {
+ "type": "image_url",
+ "image_url": {
+ "url": _,
+ },
+ }
+ for _ in checked_urls
+ ]
+
+ returned_msg["content"].extend(image_dicts)
+
+ return returned_msg
+
def format(
self,
*args: Union[MessageBase, Sequence[MessageBase]],
@@ -230,19 +304,22 @@ def format(
The formatted messages in the format that OpenAI Chat API
required.
"""
-
messages = []
for arg in args:
if arg is None:
continue
if isinstance(arg, MessageBase):
- messages.append(
- {
- "role": arg.role,
- "name": arg.name,
- "content": _convert_to_str(arg.content),
- },
- )
+ if arg.url is not None:
+ messages.append(self._format_msg_with_url(arg))
+ else:
+ messages.append(
+ {
+ "role": arg.role,
+ "name": arg.name,
+ "content": _convert_to_str(arg.content),
+ },
+ )
+
elif isinstance(arg, list):
messages.extend(self.format(*arg))
else:
diff --git a/src/agentscope/parsers/code_block_parser.py b/src/agentscope/parsers/code_block_parser.py
index df6341123..627d89a1f 100644
--- a/src/agentscope/parsers/code_block_parser.py
+++ b/src/agentscope/parsers/code_block_parser.py
@@ -1,5 +1,7 @@
# -*- coding: utf-8 -*-
"""Model response parser class for Markdown code block."""
+from typing import Optional
+
from agentscope.models import ModelResponse
from agentscope.parsers import ParserBase
@@ -22,15 +24,38 @@ class MarkdownCodeBlockParser(ParserBase):
format_instruction: str = (
"You should generate {language_name} code in a {language_name} fenced "
"code block as follows: \n```{language_name}\n"
- "${{your_{language_name}_code}}\n```"
+ "{content_hint}\n```"
)
"""The instruction for the format of the code block."""
- def __init__(self, language_name: str) -> None:
+ def __init__(
+ self,
+ language_name: str,
+ content_hint: Optional[str] = None,
+ ) -> None:
+ """Initialize the parser with the language name and the optional
+ content hint.
+
+ Args:
+ language_name (`str`):
+ The name of the language, which will be used
+ in ```{language_name}
+ content_hint (`Optional[str]`, defaults to `None`):
+ The hint used to remind LLM what should be fill between the
+ tags. If not provided, the default content hint
+ "${{your_{language_name}_code}}" will be used.
+ """
self.name = self.name.format(language_name=language_name)
self.tag_begin = self.tag_begin.format(language_name=language_name)
+
+ if content_hint is None:
+ self.content_hint = f"${{your_{language_name}_code}}"
+ else:
+ self.content_hint = content_hint
+
self.format_instruction = self.format_instruction.format(
language_name=language_name,
+ content_hint=self.content_hint,
).strip()
def parse(self, response: ModelResponse) -> ModelResponse:
diff --git a/src/agentscope/parsers/json_object_parser.py b/src/agentscope/parsers/json_object_parser.py
index 14bd4d5fb..74b82b51d 100644
--- a/src/agentscope/parsers/json_object_parser.py
+++ b/src/agentscope/parsers/json_object_parser.py
@@ -2,7 +2,7 @@
"""The parser for JSON object in the model response."""
import json
from copy import deepcopy
-from typing import Optional, Any, List
+from typing import Optional, Any, List, Sequence, Union
from loguru import logger
@@ -14,6 +14,7 @@
)
from agentscope.models import ModelResponse
from agentscope.parsers import ParserBase
+from agentscope.parsers.parser_base import DictFilterMixin
from agentscope.utils.tools import _join_str_with_comma_and
@@ -121,7 +122,7 @@ def format_instruction(self) -> str:
)
-class MarkdownJsonDictParser(MarkdownJsonObjectParser):
+class MarkdownJsonDictParser(MarkdownJsonObjectParser, DictFilterMixin):
"""A class used to parse a JSON dictionary object in a markdown fenced
code"""
@@ -152,6 +153,9 @@ def __init__(
self,
content_hint: Optional[Any] = None,
required_keys: List[str] = None,
+ keys_to_memory: Optional[Union[str, bool, Sequence[str]]] = True,
+ keys_to_content: Optional[Union[str, bool, Sequence[str]]] = True,
+ keys_to_metadata: Optional[Union[str, bool, Sequence[str]]] = False,
) -> None:
"""Initialize the parser with the content hint.
@@ -165,8 +169,42 @@ def __init__(
A list of required keys in the JSON dictionary object. If the
response misses any of the required keys, it will raise a
RequiredFieldNotFoundError.
+ keys_to_memory (`Optional[Union[str, bool, Sequence[str]]]`,
+ defaults to `True`):
+ The key or keys to be filtered in `to_memory` method. If
+ it's
+ - `False`, `None` will be returned in the `to_memory` method
+ - `str`, the corresponding value will be returned
+ - `List[str]`, a filtered dictionary will be returned
+ - `True`, the whole dictionary will be returned
+ keys_to_content (`Optional[Union[str, bool, Sequence[str]]`,
+ defaults to `True`):
+ The key or keys to be filtered in `to_content` method. If
+ it's
+ - `False`, `None` will be returned in the `to_content` method
+ - `str`, the corresponding value will be returned
+ - `List[str]`, a filtered dictionary will be returned
+ - `True`, the whole dictionary will be returned
+ keys_to_metadata (`Optional[Union[str, bool, Sequence[str]]`,
+ defaults to `False`):
+ The key or keys to be filtered in `to_metadata` method. If
+ it's
+ - `False`, `None` will be returned in the `to_metadata` method
+ - `str`, the corresponding value will be returned
+ - `List[str]`, a filtered dictionary will be returned
+ - `True`, the whole dictionary will be returned
+
"""
- super().__init__(content_hint)
+ # Initialize the markdown json object parser
+ MarkdownJsonObjectParser.__init__(self, content_hint)
+
+ # Initialize the mixin class to allow filtering the parsed response
+ DictFilterMixin.__init__(
+ self,
+ keys_to_memory=keys_to_memory,
+ keys_to_content=keys_to_content,
+ keys_to_metadata=keys_to_metadata,
+ )
self.required_keys = required_keys or []
diff --git a/src/agentscope/parsers/parser_base.py b/src/agentscope/parsers/parser_base.py
index 3f4d4d7f4..dd56df762 100644
--- a/src/agentscope/parsers/parser_base.py
+++ b/src/agentscope/parsers/parser_base.py
@@ -1,10 +1,17 @@
# -*- coding: utf-8 -*-
"""The base class for model response parser."""
from abc import ABC, abstractmethod
+from typing import Union, Sequence
+
+from loguru import logger
from agentscope.exception import TagNotFoundError
from agentscope.models import ModelResponse
+# TODO: Support one-time warning in logger rather than setting global variable
+_FIRST_TIME_TO_REPORT_CONTENT = True
+_FIRST_TIME_TO_REPORT_MEMORY = True
+
class ParserBase(ABC):
"""The base class for model response parser."""
@@ -54,7 +61,7 @@ def _extract_first_content_by_tag(
raise TagNotFoundError(
f"Missing "
f"tag{'' if len(missing_tags)==1 else 's'} "
- f"{' and '.join(missing_tags)} in response.",
+ f"{' and '.join(missing_tags)} in response: {text}",
raw_response=text,
missing_begin_tag=index_start == -1,
missing_end_tag=index_end == -1,
@@ -65,3 +72,137 @@ def _extract_first_content_by_tag(
]
return extract_text
+
+
+class DictFilterMixin:
+ """A mixin class to filter the parsed response by keys. It allows users
+ to set keys to be filtered during speaking, storing in memory, and
+ returning in the agent reply function.
+ """
+
+ def __init__(
+ self,
+ keys_to_memory: Union[str, bool, Sequence[str]],
+ keys_to_content: Union[str, bool, Sequence[str]],
+ keys_to_metadata: Union[str, bool, Sequence[str]],
+ ) -> None:
+ """Initialize the mixin class with the keys to be filtered during
+ speaking, storing in memory, and returning in the agent reply function.
+
+ Args:
+ keys_to_memory (`Optional[Union[str, bool, Sequence[str]]]`):
+ The key or keys to be filtered in `to_memory` method. If
+ it's
+ - `False`, `None` will be returned in the `to_memory` method
+ - `str`, the corresponding value will be returned
+ - `List[str]`, a filtered dictionary will be returned
+ - `True`, the whole dictionary will be returned
+ keys_to_content (`Optional[Union[str, bool, Sequence[str]]`):
+ The key or keys to be filtered in `to_content` method. If
+ it's
+ - `False`, `None` will be returned in the `to_content` method
+ - `str`, the corresponding value will be returned
+ - `List[str]`, a filtered dictionary will be returned
+ - `True`, the whole dictionary will be returned
+ keys_to_metadata (`Optional[Union[str, bool, Sequence[str]]]`):
+ The key or keys to be filtered in `to_metadata` method. If
+ it's
+ - `False`, `None` will be returned in the `to_metadata` method
+ - `str`, the corresponding value will be returned
+ - `List[str]`, a filtered dictionary will be returned
+ - `True`, the whole dictionary will be returned
+ """
+ self.keys_to_memory = keys_to_memory
+ self.keys_to_content = keys_to_content
+ self.keys_to_metadata = keys_to_metadata
+
+ def to_memory(
+ self,
+ parsed_response: dict,
+ allow_missing: bool = False,
+ ) -> Union[str, dict, None]:
+ """Filter the fields that will be stored in memory."""
+ return self._filter_content_by_names(
+ parsed_response,
+ self.keys_to_memory,
+ allow_missing=allow_missing,
+ )
+
+ def to_content(
+ self,
+ parsed_response: dict,
+ allow_missing: bool = False,
+ ) -> Union[str, dict, None]:
+ """Filter the fields that will be fed into the content field in the
+ returned message, which will be exposed to other agents.
+ """
+ return self._filter_content_by_names(
+ parsed_response,
+ self.keys_to_content,
+ allow_missing=allow_missing,
+ )
+
+ def to_metadata(
+ self,
+ parsed_response: dict,
+ allow_missing: bool = False,
+ ) -> Union[str, dict, None]:
+ """Filter the fields that will be fed into the returned message
+ directly to control the application workflow."""
+ return self._filter_content_by_names(
+ parsed_response,
+ self.keys_to_metadata,
+ allow_missing=allow_missing,
+ )
+
+ def _filter_content_by_names(
+ self,
+ parsed_response: dict,
+ keys: Union[str, bool, Sequence[str]],
+ allow_missing: bool = False,
+ ) -> Union[str, dict, None]:
+ """Filter the parsed response by keys. If only one key is provided, the
+ returned content will be a single corresponding value. Otherwise,
+ the returned content will be a dictionary with the filtered keys and
+ their corresponding values.
+
+ Args:
+ keys (`Union[str, bool, Sequence[str]]`):
+ The key or keys to be filtered. If it's
+ - `False`, `None` will be returned in the `to_content` method
+ - `str`, the corresponding value will be returned
+ - `List[str]`, a filtered dictionary will be returned
+ - `True`, the whole dictionary will be returned
+ allow_missing (`bool`, defaults to `False`):
+ Whether to allow missing keys in the response. If set to
+ `True`, the method will skip the missing keys in the response.
+ Otherwise, it will raise a `ValueError` when a key is missing.
+
+ Returns:
+ `Union[str, dict]`: The filtered content.
+ """
+
+ if isinstance(keys, bool):
+ if keys:
+ return parsed_response
+ else:
+ return None
+
+ if isinstance(keys, str):
+ return parsed_response[keys]
+
+ # check if the required names are in the response
+ for name in keys:
+ if name not in parsed_response:
+ if allow_missing:
+ logger.warning(
+ f"Content name {name} not found in the response. Skip "
+ f"it.",
+ )
+ else:
+ raise ValueError(f"Name {name} not found in the response.")
+ return {
+ name: parsed_response[name]
+ for name in keys
+ if name in parsed_response
+ }
diff --git a/src/agentscope/parsers/tagged_content_parser.py b/src/agentscope/parsers/tagged_content_parser.py
index 464617a25..9f7a17d36 100644
--- a/src/agentscope/parsers/tagged_content_parser.py
+++ b/src/agentscope/parsers/tagged_content_parser.py
@@ -1,10 +1,12 @@
# -*- coding: utf-8 -*-
"""The parser for tagged content in the model response."""
import json
+from typing import Union, Sequence, Optional, List
-from agentscope.exception import JsonParsingError
+from agentscope.exception import JsonParsingError, TagNotFoundError
from agentscope.models import ModelResponse
from agentscope.parsers import ParserBase
+from agentscope.parsers.parser_base import DictFilterMixin
class TaggedContent:
@@ -12,7 +14,8 @@ class TaggedContent:
and tag end."""
name: str
- """The name of the tagged content."""
+ """The name of the tagged content, which will be used as the key in
+ extracted dictionary."""
tag_begin: str
"""The beginning tag."""
@@ -60,7 +63,7 @@ def __str__(self) -> str:
return f"{self.tag_begin}{self.content_hint}{self.tag_end}"
-class MultiTaggedContentParser(ParserBase):
+class MultiTaggedContentParser(ParserBase, DictFilterMixin):
"""Parse response text by multiple tags, and return a dict of their
content. Asking llm to generate JSON dictionary object directly maybe not a
good idea due to involving escape characters and other issues. So we can
@@ -79,14 +82,60 @@ class MultiTaggedContentParser(ParserBase):
equals to `True`, this instruction will be used to remind the model to
generate JSON object."""
- def __init__(self, *tagged_contents: TaggedContent) -> None:
+ def __init__(
+ self,
+ *tagged_contents: TaggedContent,
+ keys_to_memory: Optional[Union[str, bool, Sequence[str]]] = True,
+ keys_to_content: Optional[Union[str, bool, Sequence[str]]] = True,
+ keys_to_metadata: Optional[Union[str, bool, Sequence[str]]] = False,
+ keys_allow_missing: Optional[List[str]] = None,
+ ) -> None:
"""Initialize the parser with tags.
Args:
- tags (`dict[str, Tuple[str, str]]`):
- A dictionary of tags, the key is the tag name, and the value is
- a tuple of starting tag and end tag.
+ *tagged_contents (`dict[str, Tuple[str, str]]`):
+ Multiple TaggedContent objects, each object contains the tag
+ name, tag begin, content hint and tag end. The name will be
+ used as the key in the extracted dictionary.
+ required_keys (`Optional[List[str]]`, defaults to `None`):
+ A list of required
+ keys_to_memory (`Optional[Union[str, bool, Sequence[str]]]`,
+ defaults to `True`):
+ The key or keys to be filtered in `to_memory` method. If
+ it's
+ - `False`, `None` will be returned in the `to_memory` method
+ - `str`, the corresponding value will be returned
+ - `List[str]`, a filtered dictionary will be returned
+ - `True`, the whole dictionary will be returned
+ keys_to_content (`Optional[Union[str, bool, Sequence[str]]`,
+ defaults to `True`):
+ The key or keys to be filtered in `to_content` method. If
+ it's
+ - `False`, `None` will be returned in the `to_content` method
+ - `str`, the corresponding value will be returned
+ - `List[str]`, a filtered dictionary will be returned
+ - `True`, the whole dictionary will be returned
+ keys_to_metadata (`Optional[Union[str, bool, Sequence[str]]]`,
+ defaults to `False`):
+ The key or keys to be filtered in `to_metadata` method. If
+ it's
+ - `False`, `None` will be returned in the `to_metadata` method
+ - `str`, the corresponding value will be returned
+ - `List[str]`, a filtered dictionary will be returned
+ - `True`, the whole dictionary will be returned
+ keys_allow_missing (`Optional[List[str]]`, defaults to `None`):
+ A list of keys that are allowed to be missing in the response.
"""
+ # Initialize the mixin class
+ DictFilterMixin.__init__(
+ self,
+ keys_to_memory=keys_to_memory,
+ keys_to_content=keys_to_content,
+ keys_to_metadata=keys_to_metadata,
+ )
+
+ self.keys_allow_missing = keys_allow_missing
+
self.tagged_contents = list(tagged_contents)
# Prepare the format instruction according to the tagged contents
@@ -123,26 +172,38 @@ def parse(self, response: ModelResponse) -> ModelResponse:
tag_begin = tagged_content.tag_begin
tag_end = tagged_content.tag_end
- extract_content = self._extract_first_content_by_tag(
- response,
- tag_begin,
- tag_end,
- )
-
- if tagged_content.parse_json:
- try:
- extract_content = json.loads(extract_content)
- except json.decoder.JSONDecodeError as e:
- raw_response = f"{tag_begin}{extract_content}{tag_end}"
- raise JsonParsingError(
- f"The content between {tagged_content.tag_begin} and "
- f"{tagged_content.tag_end} should be a JSON object."
- f'When parsing "{raw_response}", an error occurred: '
- f"{e}",
- raw_response=raw_response,
- ) from None
-
- tag_to_content[tagged_content.name] = extract_content
+ try:
+ extract_content = self._extract_first_content_by_tag(
+ response,
+ tag_begin,
+ tag_end,
+ )
+
+ if tagged_content.parse_json:
+ try:
+ extract_content = json.loads(extract_content)
+ except json.decoder.JSONDecodeError as e:
+ raw_response = f"{tag_begin}{extract_content}{tag_end}"
+ raise JsonParsingError(
+ f"The content between "
+ f"{tagged_content.tag_begin} and "
+ f"{tagged_content.tag_end} should be a JSON "
+ f'object. An error "{e}" occurred when parsing: '
+ f"{raw_response}",
+ raw_response=raw_response,
+ ) from None
+
+ tag_to_content[tagged_content.name] = extract_content
+
+ except TagNotFoundError as e:
+ # if the key is allowed to be missing, skip the error
+ if (
+ self.keys_allow_missing is not None
+ and tagged_content.name in self.keys_allow_missing
+ ):
+ continue
+
+ raise e from None
response.parsed = tag_to_content
return response
diff --git a/src/agentscope/rpc/__init__.py b/src/agentscope/rpc/__init__.py
index 03cf58169..2c7703a90 100644
--- a/src/agentscope/rpc/__init__.py
+++ b/src/agentscope/rpc/__init__.py
@@ -5,16 +5,19 @@
try:
from .rpc_agent_pb2 import RpcMsg # pylint: disable=E0611
-except ModuleNotFoundError:
- RpcMsg = Any # type: ignore[misc]
-try:
from .rpc_agent_pb2_grpc import RpcAgentServicer
from .rpc_agent_pb2_grpc import RpcAgentStub
from .rpc_agent_pb2_grpc import add_RpcAgentServicer_to_server
-except ImportError:
- RpcAgentServicer = object
- RpcAgentStub = Any
- add_RpcAgentServicer_to_server = Any
+except ImportError as import_error:
+ from agentscope.utils.tools import ImportErrorReporter
+
+ RpcMsg = ImportErrorReporter(import_error, "distribute") # type: ignore[misc]
+ RpcAgentServicer = ImportErrorReporter(import_error, "distribute")
+ RpcAgentStub = ImportErrorReporter(import_error, "distribute")
+ add_RpcAgentServicer_to_server = ImportErrorReporter(
+ import_error,
+ "distribute",
+ )
__all__ = [
diff --git a/src/agentscope/rpc/rpc_agent_client.py b/src/agentscope/rpc/rpc_agent_client.py
index ab9f1a565..189e0895f 100644
--- a/src/agentscope/rpc/rpc_agent_client.py
+++ b/src/agentscope/rpc/rpc_agent_client.py
@@ -3,24 +3,23 @@
import threading
import base64
-from typing import Any, Optional
+from typing import Optional
from loguru import logger
try:
import dill
import grpc
from grpc import RpcError
-except ImportError:
- dill = None
- grpc = None
- RpcError = None
-
-try:
from agentscope.rpc.rpc_agent_pb2 import RpcMsg # pylint: disable=E0611
from agentscope.rpc.rpc_agent_pb2_grpc import RpcAgentStub
-except ModuleNotFoundError:
- RpcMsg = Any # type: ignore[misc]
- RpcAgentStub = Any
+except ImportError as import_error:
+ from agentscope.utils.tools import ImportErrorReporter
+
+ dill = ImportErrorReporter(import_error, "distribute")
+ grpc = ImportErrorReporter(import_error, "distribute")
+ RpcMsg = ImportErrorReporter(import_error, "distribute")
+ RpcAgentStub = ImportErrorReporter(import_error, "distribute")
+ RpcError = ImportError
class RpcAgentClient:
diff --git a/src/agentscope/rpc/rpc_agent_pb2_grpc.py b/src/agentscope/rpc/rpc_agent_pb2_grpc.py
index 93ee27369..4099c7027 100644
--- a/src/agentscope/rpc/rpc_agent_pb2_grpc.py
+++ b/src/agentscope/rpc/rpc_agent_pb2_grpc.py
@@ -3,8 +3,10 @@
"""Client and server classes corresponding to protobuf-defined services."""
try:
import grpc
-except ImportError:
- grpc = None
+except ImportError as import_error:
+ from agentscope.utils.tools import ImportErrorReporter
+
+ grpc = ImportErrorReporter(import_error, "distribute")
import agentscope.rpc.rpc_agent_pb2 as rpc__agent__pb2
diff --git a/src/agentscope/service/file/common.py b/src/agentscope/service/file/common.py
index adeb5a0ad..ef8e8855b 100644
--- a/src/agentscope/service/file/common.py
+++ b/src/agentscope/service/file/common.py
@@ -1,10 +1,10 @@
# -*- coding: utf-8 -*-
+# pylint: disable=C0301
""" Common operators for file and directory. """
import os
import shutil
from typing import List
-from agentscope.utils.common import write_file
from agentscope.service.service_response import ServiceResponse
from agentscope.service.service_status import ServiceExecStatus
@@ -29,7 +29,19 @@ def create_file(file_path: str, content: str = "") -> ServiceResponse:
status=ServiceExecStatus.ERROR,
content="FileExistsError: The file already exists.",
)
- return write_file(content, file_path)
+ try:
+ with open(file_path, "w", encoding="utf-8") as file:
+ file.write(content)
+ return ServiceResponse(
+ status=ServiceExecStatus.SUCCESS,
+ content="Success",
+ )
+ except Exception as e:
+ error_message = f"{e.__class__.__name__}: {e}"
+ return ServiceResponse(
+ status=ServiceExecStatus.ERROR,
+ content=error_message,
+ )
def delete_file(file_path: str) -> ServiceResponse:
diff --git a/src/agentscope/utils/tools.py b/src/agentscope/utils/tools.py
index 75cc0c7cb..8ebd23777 100644
--- a/src/agentscope/utils/tools.py
+++ b/src/agentscope/utils/tools.py
@@ -3,6 +3,7 @@
import base64
import datetime
import json
+import os.path
import secrets
import string
from typing import Any, Literal, List
@@ -129,7 +130,7 @@ def _to_openai_image_url(url: str) -> str:
"""
# See https://platform.openai.com/docs/guides/vision for details of
# support image extensions.
- image_extensions = (
+ support_image_extensions = (
".png",
".jpg",
".jpeg",
@@ -139,16 +140,17 @@ def _to_openai_image_url(url: str) -> str:
parsed_url = urlparse(url)
- # Checking for HTTP(S) image links
- if parsed_url.scheme in ["http", "https"]:
- lower_path = parsed_url.path.lower()
- if lower_path.endswith(image_extensions):
+ lower_url = url.lower()
+
+ # Web url
+ if parsed_url.scheme != "":
+ if any(lower_url.endswith(_) for _ in support_image_extensions):
return url
# Check if it is a local file
- elif parsed_url.scheme == "file" or not parsed_url.scheme:
- if parsed_url.path.lower().endswith(image_extensions):
- with open(parsed_url.path, "rb") as image_file:
+ elif os.path.exists(url) and os.path.isfile(url):
+ if any(lower_url.endswith(_) for _ in support_image_extensions):
+ with open(url, "rb") as image_file:
base64_image = base64.b64encode(image_file.read()).decode(
"utf-8",
)
@@ -156,7 +158,7 @@ def _to_openai_image_url(url: str) -> str:
mime_type = f"image/{extension}"
return f"data:{mime_type};base64,{base64_image}"
- raise TypeError(f"{url} should be end with {image_extensions}.")
+ raise TypeError(f"{url} should be end with {support_image_extensions}.")
def _download_file(url: str, path_file: str, max_retries: int = 3) -> bool:
@@ -294,3 +296,39 @@ def _join_str_with_comma_and(elements: List[str]) -> str:
return " and ".join(elements)
else:
return ", ".join(elements[:-1]) + f", and {elements[-1]}"
+
+
+class ImportErrorReporter:
+ """Used as a placeholder for missing packages.
+ When called, an ImportError will be raised, prompting the user to install
+ the specified extras requirement.
+ """
+
+ def __init__(self, error: ImportError, extras_require: str = None) -> None:
+ """Init the ImportErrorReporter.
+
+ Args:
+ error (`ImportError`): the original ImportError.
+ extras_require (`str`): the extras requirement.
+ """
+ self.error = error
+ self.extras_require = extras_require
+
+ def __call__(self, *args: Any, **kwds: Any) -> Any:
+ return self._raise_import_error()
+
+ def __getattr__(self, name: str) -> Any:
+ return self._raise_import_error()
+
+ def __getitem__(self, __key: Any) -> Any:
+ return self._raise_import_error()
+
+ def _raise_import_error(self) -> Any:
+ """Raise the ImportError"""
+ err_msg = f"ImportError occorred: [{self.error.msg}]."
+ if self.extras_require is not None:
+ err_msg += (
+ f" Please install [{self.extras_require}] version"
+ " of agentscope."
+ )
+ raise ImportError(err_msg)
diff --git a/src/agentscope/web/workstation/workflow_node.py b/src/agentscope/web/workstation/workflow_node.py
index f4b14d914..0005e04b8 100644
--- a/src/agentscope/web/workstation/workflow_node.py
+++ b/src/agentscope/web/workstation/workflow_node.py
@@ -826,6 +826,7 @@ def compile(self) -> dict:
"dashscope_chat": ModelNode,
"openai_chat": ModelNode,
"post_api_chat": ModelNode,
+ "post_api_dall_e": ModelNode,
"Message": MsgNode,
"DialogAgent": DialogAgentNode,
"UserAgent": UserAgentNode,
diff --git a/tests/format_test.py b/tests/format_test.py
index 661950743..07efa86ae 100644
--- a/tests/format_test.py
+++ b/tests/format_test.py
@@ -1,6 +1,7 @@
# -*- coding: utf-8 -*-
"""Unit test for prompt engineering strategies in format function."""
import unittest
+from unittest import mock
from unittest.mock import MagicMock, patch
from agentscope.message import Msg
@@ -31,6 +32,27 @@ def setUp(self) -> None:
],
]
+ self.inputs_vision = [
+ Msg("system", "You are a helpful assistant", role="system"),
+ [
+ Msg(
+ "user",
+ "Describe the images",
+ role="user",
+ url="https://fakeweb/test.jpg",
+ ),
+ Msg(
+ "user",
+ "And this images",
+ "user",
+ url=[
+ "/Users/xxx/abc.png",
+ "/Users/xxx/def.mp3",
+ ],
+ ),
+ ],
+ ]
+
self.wrong_inputs = [
Msg("system", "You are a helpful assistant", role="system"),
[
@@ -39,6 +61,118 @@ def setUp(self) -> None:
],
]
+ @patch("builtins.open", mock.mock_open(read_data=b"abcdef"))
+ @patch("os.path.isfile")
+ @patch("os.path.exists")
+ @patch("openai.OpenAI")
+ def test_openai_chat_vision_with_wrong_model(
+ self,
+ mock_client: MagicMock,
+ mock_exists: MagicMock,
+ mock_isfile: MagicMock,
+ ) -> None:
+ """Unit test for the format function in openai chat api wrapper with
+ vision models"""
+ mock_exists.side_effect = lambda url: url == "/Users/xxx/abc.png"
+ mock_isfile.side_effect = lambda url: url == "/Users/xxx/abc.png"
+
+ # Prepare the mock client
+ mock_client.return_value = "client_dummy"
+
+ model = OpenAIChatWrapper(
+ config_name="",
+ model_name="gpt-4",
+ )
+
+ # correct format
+ ground_truth = [
+ {
+ "role": "system",
+ "content": "You are a helpful assistant",
+ "name": "system",
+ },
+ {
+ "role": "user",
+ "name": "user",
+ "content": "Describe the images",
+ },
+ {
+ "role": "user",
+ "name": "user",
+ "content": "And this images",
+ },
+ ]
+
+ prompt = model.format(*self.inputs_vision)
+ self.assertListEqual(prompt, ground_truth)
+
+ @patch("builtins.open", mock.mock_open(read_data=b"abcdef"))
+ @patch("os.path.isfile")
+ @patch("os.path.exists")
+ @patch("openai.OpenAI")
+ def test_openai_chat_vision(
+ self,
+ mock_client: MagicMock,
+ mock_exists: MagicMock,
+ mock_isfile: MagicMock,
+ ) -> None:
+ """Unit test for the format function in openai chat api wrapper with
+ vision models"""
+ mock_exists.side_effect = lambda url: url == "/Users/xxx/abc.png"
+ mock_isfile.side_effect = lambda url: url == "/Users/xxx/abc.png"
+
+ # Prepare the mock client
+ mock_client.return_value = "client_dummy"
+
+ model = OpenAIChatWrapper(
+ config_name="",
+ model_name="gpt-4o",
+ )
+
+ # correct format
+ ground_truth = [
+ {
+ "role": "system",
+ "content": "You are a helpful assistant",
+ "name": "system",
+ },
+ {
+ "role": "user",
+ "name": "user",
+ "content": [
+ {
+ "type": "text",
+ "text": "Describe the images",
+ },
+ {
+ "type": "image_url",
+ "image_url": {
+ "url": "https://fakeweb/test.jpg",
+ },
+ },
+ ],
+ },
+ {
+ "role": "user",
+ "name": "user",
+ "content": [
+ {
+ "type": "text",
+ "text": "And this images",
+ },
+ {
+ "type": "image_url",
+ "image_url": {
+ "url": "data:image/png;base64,YWJjZGVm",
+ },
+ },
+ ],
+ },
+ ]
+
+ prompt = model.format(*self.inputs_vision)
+ self.assertListEqual(prompt, ground_truth)
+
@patch("openai.OpenAI")
def test_openai_chat(self, mock_client: MagicMock) -> None:
"""Unit test for the format function in openai chat api wrapper."""
diff --git a/tests/parser_test.py b/tests/parser_test.py
new file mode 100644
index 000000000..384ef64cf
--- /dev/null
+++ b/tests/parser_test.py
@@ -0,0 +1,232 @@
+# -*- coding: utf-8 -*-
+"""Unit test for model response parser."""
+import unittest
+
+from agentscope.models import ModelResponse
+from agentscope.parsers import (
+ MarkdownJsonDictParser,
+ MarkdownJsonObjectParser,
+ MarkdownCodeBlockParser,
+ MultiTaggedContentParser,
+ TaggedContent,
+)
+from agentscope.parsers.parser_base import DictFilterMixin
+
+
+class ModelResponseParserTest(unittest.TestCase):
+ """Unit test for model response parser."""
+
+ def setUp(self) -> None:
+ """Init for ExampleTest."""
+ self.res_dict_1 = ModelResponse(
+ text=(
+ "```json\n"
+ '{"speak": "Hello, world!", '
+ '"thought": "xxx", '
+ '"end_discussion": true}\n```'
+ ),
+ )
+ self.instruction_dict_1 = (
+ "You should respond a json object in a json fenced code block "
+ "as follows:\n"
+ "```json\n"
+ '{"speak": "what you speak", '
+ '"thought": "what you thought", '
+ '"end_discussion": true/false}\n'
+ "```"
+ )
+ self.res_dict_2 = ModelResponse(
+ text="[SPEAK]Hello, world![/SPEAK]\n"
+ "[THOUGHT]xxx[/THOUGHT]\n"
+ "[END_DISCUSSION]true[/END_DISCUSSION]",
+ )
+ self.instruction_dict_2 = (
+ "Respond with specific tags as outlined below, and the content "
+ "between [END_DISCUSSION] and [/END_DISCUSSION] MUST be a JSON "
+ "object:\n"
+ "[SPEAK]what you speak[/SPEAK]\n"
+ "[THOUGHT]what you thought[/THOUGHT]\n"
+ "[END_DISCUSSION]true/false[/END_DISCUSSION]"
+ )
+ self.gt_dict = {
+ "speak": "Hello, world!",
+ "thought": "xxx",
+ "end_discussion": True,
+ }
+ self.hint_dict = (
+ '{"speak": "what you speak", '
+ '"thought": "what you thought", '
+ '"end_discussion": true/false}'
+ )
+
+ self.gt_to_memory = {"speak": "Hello, world!", "thought": "xxx"}
+ self.gt_to_content = "Hello, world!"
+ self.gt_to_metadata = {"end_discussion": True}
+
+ self.res_list = ModelResponse(text="""```json\n[1,2,3]\n```""")
+ self.instruction_list = (
+ "You should respond a json object in a json fenced code block as "
+ "follows:\n"
+ "```json\n"
+ "{Your generated list of numbers}\n"
+ "```"
+ )
+ self.gt_list = [1, 2, 3]
+ self.hint_list = "{Your generated list of numbers}"
+
+ self.res_float = ModelResponse(text="""```json\n3.14\n```""")
+ self.instruction_float = (
+ "You should respond a json object in a json fenced code block as "
+ "follows:\n"
+ "```json\n"
+ "{Your generated float number}\n"
+ "```"
+ )
+ self.gt_float = 3.14
+ self.hint_float = "{Your generated float number}"
+
+ self.res_code = ModelResponse(
+ text="""```python\nprint("Hello, world!")\n```""",
+ )
+ self.instruction_code = (
+ "You should generate python code in a python fenced code block as "
+ "follows: \n"
+ "```python\n"
+ "${your_python_code}\n"
+ "```"
+ )
+ self.instruction_code_with_hint = (
+ "You should generate python code in a python fenced code block as "
+ "follows: \n"
+ "```python\n"
+ "abc\n"
+ "```"
+ )
+ self.gt_code = """\nprint("Hello, world!")\n"""
+
+ def test_markdownjsondictparser(self) -> None:
+ """Test for MarkdownJsonDictParser"""
+ parser = MarkdownJsonDictParser(
+ content_hint=self.hint_dict,
+ keys_to_memory=["speak", "thought"],
+ keys_to_content="speak",
+ keys_to_metadata=["end_discussion"],
+ )
+
+ self.assertEqual(parser.format_instruction, self.instruction_dict_1)
+
+ res = parser.parse(self.res_dict_1)
+
+ self.assertDictEqual(res.parsed, self.gt_dict)
+
+ # test filter functions
+ self.assertDictEqual(parser.to_memory(res.parsed), self.gt_to_memory)
+ self.assertEqual(parser.to_content(res.parsed), self.gt_to_content)
+ self.assertDictEqual(
+ parser.to_metadata(res.parsed),
+ self.gt_to_metadata,
+ )
+
+ def test_markdownjsonobjectparser(self) -> None:
+ """Test for MarkdownJsonObjectParser"""
+ # list
+ parser_list = MarkdownJsonObjectParser(content_hint=self.hint_list)
+
+ self.assertEqual(parser_list.format_instruction, self.instruction_list)
+
+ res_list = parser_list.parse(self.res_list)
+ self.assertListEqual(res_list.parsed, self.gt_list)
+
+ # float
+ parser_float = MarkdownJsonObjectParser(content_hint=self.hint_float)
+
+ self.assertEqual(
+ parser_float.format_instruction,
+ self.instruction_float,
+ )
+
+ res_float = parser_float.parse(self.res_float)
+ self.assertEqual(res_float.parsed, self.gt_float)
+
+ def test_markdowncodeblockparser(self) -> None:
+ """Test for MarkdownCodeBlockParser"""
+ parser = MarkdownCodeBlockParser(language_name="python")
+
+ self.assertEqual(parser.format_instruction, self.instruction_code)
+
+ res = parser.parse(self.res_code)
+
+ self.assertEqual(res.parsed, self.gt_code)
+
+ def test_markdowncodeblockparser_with_hint(self) -> None:
+ """Test for MarkdownCodeBlockParser"""
+ parser = MarkdownCodeBlockParser(
+ language_name="python",
+ content_hint="abc",
+ )
+
+ self.assertEqual(
+ parser.format_instruction,
+ self.instruction_code_with_hint,
+ )
+
+ res = parser.parse(self.res_code)
+
+ self.assertEqual(res.parsed, self.gt_code)
+
+ def test_multitaggedcontentparser(self) -> None:
+ """Test for MultiTaggedContentParser"""
+ parser = MultiTaggedContentParser(
+ TaggedContent(
+ "speak",
+ tag_begin="[SPEAK]",
+ content_hint="what you speak",
+ tag_end="[/SPEAK]",
+ ),
+ TaggedContent(
+ "thought",
+ tag_begin="[THOUGHT]",
+ content_hint="what you thought",
+ tag_end="[/THOUGHT]",
+ ),
+ TaggedContent(
+ "end_discussion",
+ tag_begin="[END_DISCUSSION]",
+ content_hint="true/false",
+ tag_end="[/END_DISCUSSION]",
+ parse_json=True,
+ ),
+ keys_to_memory=["speak", "thought"],
+ keys_to_content="speak",
+ keys_to_metadata=["end_discussion"],
+ )
+
+ self.assertEqual(parser.format_instruction, self.instruction_dict_2)
+
+ res = parser.parse(self.res_dict_2)
+
+ self.assertDictEqual(res.parsed, self.gt_dict)
+
+ # test filter functions
+ self.assertDictEqual(parser.to_memory(res.parsed), self.gt_to_memory)
+ self.assertEqual(parser.to_content(res.parsed), self.gt_to_content)
+ self.assertDictEqual(
+ parser.to_metadata(res.parsed),
+ self.gt_to_metadata,
+ )
+
+ def test_DictFilterMixin_default_value(self) -> None:
+ """Test the default value of the DictFilterMixin class"""
+ mixin = DictFilterMixin(
+ keys_to_memory=True,
+ keys_to_content=True,
+ keys_to_metadata=False,
+ )
+
+ self.assertDictEqual(mixin.to_memory(self.gt_dict), self.gt_dict)
+ self.assertDictEqual(mixin.to_content(self.gt_dict), self.gt_dict)
+ self.assertEqual(mixin.to_metadata(self.gt_dict), None)
+
+
+if __name__ == "__main__":
+ unittest.main()