Large Language Model-Based Agents for Software Engineering: A Survey

The recent advance in Large Language Models (LLMs) has shaped a new paradigm of AI agents, i.e., LLM-based agents. Compared to standalone LLMs, LLM-based agents substantially extend the versatility and expertise of LLMs by enhancing LLMs with the capabilities of perceiving and utilizing external resources and tools. To date, LLM-based agents have been applied and shown remarkable effectiveness in Software Engineering (SE). The synergy between multiple agents and human interaction brings further promise in tackling complex real-world SE problems. In this work, we present a comprehensive and systematic survey on LLM-based agents for SE. We collect 106 papers and categorize them from two perspectives, i.e., the SE and agent perspectives. In addition, we discuss open challenges and future directions in this critical domain.

📍 We systematically summarized the progress of Agent4SE from the perspectives of both Software Engineering tasks and Agent Architecture.

📄 Paper Link: Large Language Model-Based Agents for Software Engineering: A Survey

⭐ Star this repository

This research field is evolving rapidly; star this repository to keep up with the updates!

📰 News

[2024/09/04] 🎉 We released the first version of our survey on arXiv.

🏎️ Coming Soon

Append the repository link to each paper.
Add a table to collect Agents from the industry (e.g. Devin, Cursor).
Provide an interactive table.

🖥️ SE Perspectives

Requirement Engineering

[2024/05] MARE: Multi-Agents Collaboration Framework for Requirements Engineering. Jin et al. arXiv. [paper]
[2024/04] Elicitron: An LLM Agent-Based Simulation Framework for Design Requirements Elicitation. Ataei et al. arXiv. [paper]
[2024/01] SpecGen: Automated Generation of Formal Program Specifications via Large Language Models. Ma et al. arXiv. [paper] [repo]
[2023/10] Advancing Requirements Engineering through Generative AI: Assessing the Role of LLMs. Arora et al. arXiv. [paper]

Code Generation

[2024/05] Class-Level Code Generation from Natural Language Using Iterative, Tool-Enhanced Reasoning over Repository. Deshpande et al. arXiv. [paper]
[2024/05] MapCoder: Multi-Agent Code Generation for Competitive Problem Solving. Islam et al. ACL. [paper] [repo]
[2024/05] AutoCoder: Enhancing Code Large Language Model with AIEV-INSTRUCT. Lei et al. arXiv. [paper] [repo]
[2024/04] 3DGen: AI-Assisted Generation of Provably Correct Binary Format Parsers. Fakhoury et al. arXiv [paper]
[2024/04] Self-Organized Agents: A LLM Multi-Agent Framework toward Ultra Large-Scale Code Generation and Optimization. Ishibashi et al. arXiv. [paper] [repo]
[2024/03] AutoDev: Automated AI-Driven Development. Tufano et al. arXiv [paper]
[2024/03] CoCoST: Automatic Complex Code Generation with Online Searching and Correctness Testing. He et al. arXiv. [paper]
[2024/03] RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Horizon Generation. Wang et al. arXiv. [paper] [repo]
[2024/02] Executable Code Actions Elicit Better LLM Agents. Wang et al. ICML. [paper] [repo]
[2024/02] More Agents Is All You Need. Li et al. arXiv. [paper]
[2024/02] Test-Driven Development for Code Generation. Mathews et al. arXiv. [paper] [repo]
[2024/02] LDB: A Large Language Model Debugger via Verifying Runtime Execution Step by Step. Zhong et al. arXiv. [paper] [repo]
[2024/01] CodeAgent: Enhancing Code Generation with Tool-Integrated Agent Systems for Real-World Repo-level Coding Challenges. Zhang et al. ACL. [paper]
[2024/01] Teaching Code LLMs to Use Autocompletion Tools in Repository-Level Code Generation. Wang et al. arXiv. [paper]
[2024/01] Code Generation with AlphaCodium: From Prompt Engineering to Flow Engineering. Ridnik et al. arXiv. [paper] [repo]
[2023/12] AgentCoder: Multi-Agent-based Code Generation with Iterative Testing and Optimisation. Huang et al. arXiv. [paper]
[2023/12] LLM4TDD: Best Practices for Test Driven Development Using Large Language Models. Piya et al. arXiv. [paper] [repo]
[2023/11] INTERVENOR: Prompting the Coding Ability of Large Language Models with the Interactive Chain of Repair. Wang et al. ACL. [paper] [repo]
[2023/10] Dynamic LLM-Agent Network: An LLM-agent Collaboration Framework with Agent Team Optimization. Liu et al. arXiv. [paper] [repo]
[2023/10] Lemur: Harmonizing Natural Language and Code for Language Agents. Xu et al. ICLR. [paper] [repo]
[2023/10] ClarifyGPT: Empowering LLM-based Code Generation with Intention Clarification. Mu et al. arXiv. [paper] [repo]
[2023/10] CODECHAIN: TOWARDS MODULAR CODE GENERATION THROUGH CHAIN OF SELF-REVISIONS WITH REPRESENTATIVE SUB-MODULES. Le et al. ICLR. [paper] [repo]
[2023/10] Language Agent Tree Search Unifies Reasoning, Acting, and Planning in Language Models. Zhou et al. ICML. [paper] [repo]
[2023/09] MINT: EVALUATING LLMS IN MULTI-TURN INTERACTION WITH TOOLS AND LANGUAGE FEEDBACK. Wang et al. ICLR. [paper] [repo]
[2023/09] Test-Case-Driven Programming Understanding in Large Language Models for Better Code Generation. Tian et al. arXiv. [paper]
[2023/09] CodePlan: Repository-level Coding using LLMs and Planning. Bairi et al. FSE. [paper] [repo]
[2023/09] From Misuse to Mastery: Enhancing Code Generation with Knowledge-Driven AI Chaining. Ren et al. ASE. [paper]
[2023/09] Parsel🐍: Algorithmic Reasoning with Language Models by Composing Decompositions. Zelikman et al. NeurIPS. [paper] [repo]
[2023/08] AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation. Wu et al. arXiv. [paper] [repo]
[2023/08] Gentopia: A Collaborative Platform for Tool-Augmented LLMs. Xu et al. EMNLP. [paper] [repo]
[2023/08] Flows: Building Blocks of Reasoning and Collaborating AI. Josifoski et al. arXiv. [paper] [repo]
[2023/08] CodeCoT: Tackling Code Syntax Errors in CoT Reasoning for Code Generation. Huang et al. arXiv. [paper]
[2023/06] SELFEVOLVE: A Code Evolution Framework via Large Language Models. Jiang et al. arXiv. [paper]
[2023/06] InterCode: Standardizing and Benchmarking Interactive Coding with Execution Feedback. Yang et al. NeurIPS. [paper] [repo]
[2023/06] IS SELF-REPAIR A SILVER BULLET FOR CODE GENERATION?. Olausson et al. ICLR. [paper] [repo]
[2023/05] ToolCoder: Teach Code Generation Models to use API search tools. Zhang et al. arXiv. [paper]
[2023/05] Self-Edit: Fault-Aware Code Editor for Code Generation. Zhang et al. ACL. [paper]
[2023/04] Teaching Large Language Models to Self-Debug. Chen et al. ICLR. [paper]
[2023/04] Fully Autonomous Programming with Large Language Models. Liventsev et al. GECCO. [paper]
[2023/03] CAMEL: Communicative Agents for “Mind” Exploration of Large Language Model Society. Li et al. NeurIPS. [paper] [repo]
[2023/03] Reflexion: Language Agents with Verbal Reinforcement Learning. Shinn et al. NeurIPS. [paper] [repo]
[2023/03] SELF-REFINE: Iterative Refinement with Self-Feedback. Madaan et al. NeurIPS. [paper] [repo]

Static Code Checking

Static Bug Detection

[2024/05] LLM-Assisted Static Analysis for Detecting Security Vulnerabilities. Li et al. arXiv. [paper]
[2024/05] PropertyGPT: LLM-driven Formal Verification of Smart Contracts through Retrieval-Augmented Property Generation. Liu et al. arXiv. [paper] [repo]
[2024/03] Multi-role Consensus through LLMs Discussions for Vulnerability Detection. Mao et al. QRS. [paper]
[2024/03] Combining Fine-tuning and LLM-based Agents for Intuitive Smart Contract Auditing with Justifications. Ma et al. arXiv. [paper] [repo]
[2024/02] When Dataflow Analysis Meets Large Language Models. Wang et al. arXiv. [paper]
[2024/01] LLM4Vuln: A Unified Evaluation Framework for Decoupling and Enhancing LLMs' Vulnerability Reasoning. Sun et al. arXiv. [paper] [repo]
[2023/12] E&V: Prompting Large Language Models to Perform Static Analysis by Pseudo-code Execution and Verification. Hao et al. arXiv. [paper]
[2023/10] Large Language Model-Powered Smart Contract Vulnerability Detection: New Perspectives. Hu et al. TPS-ISA. [paper] [repo]
[2023/10] Static Code Analysis in the AI Era: An In-depth Exploration of the Concept, Function, and Potential of Intelligent Code Analysis. Fan et al. arXiv. [paper]
[2023/08] Enhancing Static Analysis for Practical Bug Detection: An LLM-Integrated Approach. Li et al. arXiv. [paper] [repo]
[2023/03] ART: Automatic multi-step reasoning and tool-use for large language models. Paranjape et al. arXiv. [paper] [repo]

Code Review

[2024/04] AI-powered Code Review with LLMs: Early Results. Rasheed et al. arXiv. [paper]
[2024/02] CodeAgent: Collaborative Agents for Software Engineering. Tang et al. arXiv. [paper] [repo]
[2023/10] Static Code Analysis in the AI Era: An In-depth Exploration of the Concept, Function, and Potential of Intelligent Code Analysis. Fan et al. arXiv. [paper]
[2023/09] CORE: Resolving Code Quality Issues using LLMs. Wadhwa et al. FSE. [paper] [repo]

Testing

Unit Testing

[2024/06] Mokav: Execution-driven Differential Testing with LLMs. Etemadi et al. arXiv. [paper] [repo]
[2024/04] Enhancing LLM-based Test Generation for Hard-to-Cover Branches via Program Analysis. Yang et al. arXiv. [paper]
[2024/03] AutoDev: Automated AI-Driven Development. Tufano et al. arXiv [paper]
[2024/03] COVERUP: Coverage-Guided LLM-Based Test Generation. Pizzorno et al. arXiv. [paper] [repo]
[2023/08] Effective Test Generation Using Pre-trained Large Language Models and Mutation Testing. Dakhel et al. Inf. Softw. Technol. . [paper] [repo]
[2023/05] No More Manual Tests? Evaluating and Improving ChatGPT for Unit Test Generation. Yuan et al. arXiv. [paper] [repo]
[2023/05] ChatUniTest: A Framework for LLM-Based Test Generation. Chen et al. FSE. [paper] [repo]
[2023/02] An Empirical Evaluation of Using Large Language Models for Automated Unit Test Generation. Schäfer et al. IEEE Trans. Software Eng.. [paper] [repo]

System Testing

[2024/07] Vision-driven Automated Mobile GUI Testing via Multimodal Large Language Model. Liu et al. arXiv. [paper] [repo]
[2024/04] LLM Agents can Autonomously Exploit One-day Vulnerabilities. Fang et al. arXiv. [paper]
[2024/02] You Can REST Now: Automated Specification Inference and Black-Box Testing of RESTful APIs with Large Language Models. Decrop et al. arXiv. [paper] [repo]
[2024/01] XUAT-Copilot: Multi-Agent Collaborative System for Automated User Acceptance Testing with Large Language Model. Wang et al. arXiv. [paper]
[2024/01] KernelGPT: Enhanced Kernel Fuzzing via Large Language Models. Yang et al. arXiv. [paper]
[2023/11] Intent-Driven Mobile GUI Testing with Autonomous Large Language Model Agents. Yoon et al. ICST. [paper] [repo]
[2023/10] Make LLM a Testing Expert: Bringing Human-like Interaction to Mobile GUI Testing via Functionality-aware Decisions. Liu et al. ICSE. [paper]
[2023/10] AXNav: Replaying Accessibility Tests from Natural Language. Taeb et al. CHI. [paper]
[2023/10] White-box Compiler FuzzingEmpowered by Large Language Models. Yang et al. arXiv. [paper] [repo]
[2023/10] Testing the Limits: Unusual Text Inputs Generation for Mobile App Crash Detection with Large Language Model. Liu et al. ICSE. [paper] [repo]
[2023/08] PENTESTGPT: An LLM-empowered Automatic Penetration Testing Tool. Deng et al. arXiv. [paper] [repo]
[2023/08] Fuzz4All: Universal Fuzzing with Large Language Models. Xia et al. ICSE. [paper] [repo]
[2023/07] Isolating Compiler Bugs by Generating Effective Witness Programs with Large Language Models. Tu et al. IEEE Trans. Software Eng. [paper] [repo]
[2023/06] Prompting Is All You Need: Automated Android Bug Replay with Large Language Models. Feng et al. ICSE. [paper] [repo]

Debugging

Fault Localization

[2024/03] AGENTFL: Scaling LLM-based Fault Localization to Project-Level Context. Qin et al. arXiv. [paper]
[2023/10] RCAgent: Cloud Root Cause Analysis by Autonomous Agents with Tool-Augmented Large Language Models. Wang et al. arXiv. [paper]
[2023/08] A Quantitative and Qualitative Evaluation of LLM-Based Explainable Fault Localization. Kang et al. FSE. [paper]

Program Repair

[2024/09] Neurosymbolic Repair of Test Flakiness. Chen et al. ISSTA. [paper]
[2024/04] How Far Can We Go with Practical Function-Level Program Repair?. Xiang et al. arXiv. [paper] [repo]
[2024/03] RepairAgent: An Autonomous, LLM-Based Agent for Program Repair. Bouzenia et al. arXiv. [paper]
[2024/03] ACFIX: Guiding LLMs with Mined Common RBAC Practices for Context-Aware Repair of Access Control Vulnerabilities in Smart Contracts. Zhang et al. arXiv. [paper]
[2024/02] CigaR: Cost-efficient Program Repair with LLMs. Hidvégi et al. arXiv. [paper] [repo]
[2023/04] Explainable Automated Debugging via Large Language Model-driven Scientific Debugging. Kang et al. arXiv. [paper]
[2023/04] Keep the Conversation Going: Fixing 162 out of 337 bugs for $0.42 each using ChatGPT. Xia et al. arXiv. [paper]
[2023/01] Conversational Automated Program Repair. Xia et al. arXiv. [paper]

Unified Debugging

[2024/04] A Unified Debugging Approach via LLM-Based Multi-Agent Synergy. Lee et al. arXiv. [paper] [repo]
[2024/02] LDB: A Large Language Model Debugger via Verifying Runtime Execution Step by Step. Zhong et al. arXiv. [paper] [repo]

End-to-end Software Development

[2024/06] Experimenting with Multi-Agent Software Development: Towards a Unified Platform Sami et al. arXiv. [paper]
[2024/06] Scaling Large-Language-Model-based Multi-Agent Collaboration Qian et al. arXiv. [paper] [repo]
[2024/06] Multi-Agent Software Development through Cross-Team Collaboration. Du et al. arXiv. [paper] [repo]
[2024/06] AgileCoder: Dynamic Collaborative Agents for Software Development based on Agile Methodology. Nguyen et al. arXiv. [paper] [repo]
[2024/05] Iterative Experience Refinement of Software-Developing Agents. Qian et al. arXiv. [paper]
[2024/03] When LLM-based Code Generation Meets the Software Development Process. Lin et al. arXiv. [paper] [repo]
[2024/03] CodeS: Natural Language to Code Repository via Multi-Layer Sketch. Zan et al. arXiv. [paper] [repo]
[2024/02] CodePori: Large Scale Model for Autonomous Software Development by Using Multi-Agents. Rasheed et al. arXiv. [paper]
[2024/01] Experimenting a New Programming Practice with LLMs. Zhang et al. arXiv. [paper] [repo]
[2024/01] LLM4PLC: Harnessing Large Language Models for Verifiable Programming of PLCs in Industrial Control Systems. Fakih et al. ICSE. [paper] [repo]
[2023/12] Experiential Co-Learning of Software-Developing Agents. Qian et al. ACL. [paper] [repo]
[2023/11] Autonomous Agents in Software Development: A Vision Paper Rasheed et al. arXiv. [paper]
[2023/09] AutoAgents: A Framework for Automatic Agent Generation. Chen et al. arXiv. [paper] [repo]
[2023/08] AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors. Chen et al. ICLR. [paper] [repo]
[2023/08] METAGPT: META PROGRAMMING FOR A MULTI-AGENT COLLABORATIVE FRAMEWORK. Hong et al. ICLR. [paper] [repo]
[2023/07] Communicative Agents for Software Development. Qian et al. ACL. [paper] [repo]
[2023/06] MULTI-AGENT COLLABORATION: HARNESSING THE POWER OF INTELLIGENT LLM AGENTS. Talebirad et al. arXiv. [paper]
[2023/06] Prompt Sapper: LLM-Empowered Software Engineering Infrastructure for AI-Native Services. Xing et al. arXiv. [paper]
[2023/04] Self-collaboration Code Generation via ChatGPT. Dong et al. arXiv. [paper] [repo]
[2023/04] Low-code LLM: Visual Programming over LLMs. Cai et al. arXiv. [paper] [repo]

End-to-end Software Maintenance

[2024/08] DIVERSITY EMPOWERS INTELLIGENCE:INTEGRAT-ING EXPERTISE OF SOFTWARE ENGINEERING AGENTS Zhang et al. arXiv. [paper]
[2024/08] SpecRover: Code Intent Extraction via LLMs Ruan et al. arXiv. [paper] [repo]
[2024/07] Agentless: Demystifying LLM-based Software Engineering Agents. Xia et al. arXiv. [paper] [repo]
[2024/06] How to Understand Whole Software Repository?. Ma et al. arXiv. [paper] [repo]
[2024/06] CODER: ISSUE RESOLVING WITH MULTI-AGENT AND TASK GRAPHS. Chen et al. arXiv. [paper] [repo]
[2024/06] MASAI: Modular Architecture for Software-engineering AI Agents. Arora et al. arXiv. [paper]
[2024/05] SWE-AGENT: AGENT-COMPUTER INTERFACES ENABLE AUTOMATED SOFTWARE ENGINEERING. Yang et al. arXiv. [paper] [repo]
[2024/04] AutoCodeRover: Autonomous Program Improvement. Zhang et al. ISSTA. [paper] [repo]
[2024/03] MAGIS: LLM-Based Multi-Agent Framework for GitHub Issue ReSolution. Tao et al. arXiv. [paper]

🤖 Agent Perspectives

Agent Framework

Planning

Single-turn Planning

[2024/06] AgileCoder: Dynamic Collaborative Agents for Software Development based on Agile Methodology. Nguyen et al. arXiv. [paper] [repo]
[2024/06] Multi-Agent Software Development through Cross-Team Collaboration. Du et al. arXiv. [paper] [repo]
[2024/05] MapCoder: Multi-Agent Code Generation for Competitive Problem Solving. Islam et al. ACL. [paper] [repo]
[2024/03] MAGIS: LLM-Based Multi-Agent Framework for GitHub Issue ReSolution. Tao et al. arXiv. [paper]
[2024/03] CodeS: Natural Language to Code Repository via Multi-Layer Sketch. Zan et al. arXiv. [paper] [repo]
[2024/03] CoCoST: Automatic Complex Code Generation with Online Searching and Correctness Testing. He et al. arXiv. [paper]
[2024/02] CodePori: Large Scale Model for Autonomous Software Development by Using Multi-Agents. Rasheed et al. arXiv. [paper]
[2024/01] CodeAgent: Enhancing Code Generation with Tool-Integrated Agent Systems for Real-World Repo-level Coding Challenges. Zhang et al. ACL. [paper]
[2024/01] LLM4PLC: Harnessing Large Language Models for Verifiable Programming of PLCs in Industrial Control Systems. Fakih et al. ICSE. [paper] [repo]
[2024/01] Experimenting a New Programming Practice with LLMs. Zhang et al. arXiv. [paper] [repo]
[2023/11] Autonomous Agents in Software Development: A Vision Paper Rasheed et al. arXiv. [paper]
[2023/10] Static Code Analysis in the AI Era: An In-depth Exploration of the Concept, Function, and Potential of Intelligent Code Analysis. Fan et al. arXiv. [paper]
[2023/09] Parsel🐍: Algorithmic Reasoning with Language Models by Composing Decompositions. Zelikman et al. NeurIPS. [paper] [repo]
[2023/08] PENTESTGPT: An LLM-empowered Automatic Penetration Testing Tool. Deng et al. arXiv. [paper] [repo]
[2023/08] Flows: Building Blocks of Reasoning and Collaborating AI. Josifoski et al. arXiv. [paper] [repo]
[2023/08] METAGPT: META PROGRAMMING FOR A MULTI-AGENT COLLABORATIVE FRAMEWORK. Hong et al. ICLR. [paper] [repo]
[2023/07] Communicative Agents for Software Development. Qian et al. ACL. [paper] [repo]
[2023/04] Self-collaboration Code Generation via ChatGPT. Dong et al. arXiv. [paper] [repo]
[2023/04] Low-code LLM: Visual Programming over LLMs. Cai et al. arXiv. [paper] [repo]

Multi-turn Planning

[2024/03] RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Horizon Generation. Wang et al. arXiv. [paper] [repo]

React-like

[2024/06] MASAI: Modular Architecture for Software-engineering AI Agents. Arora et al. arXiv. [paper]
[2024/02] Executable Code Actions Elicit Better LLM Agents. Wang et al. ICML. [paper] [repo]
[2024/01] CodeAgent: Enhancing Code Generation with Tool-Integrated Agent Systems for Real-World Repo-level Coding Challenges. Zhang et al. ACL. [paper]
[2024/01] XUAT-Copilot: Multi-Agent Collaborative System for Automated User Acceptance Testing with Large Language Model. Wang et al. arXiv. [paper]
[2023/11] Intent-Driven Mobile GUI Testing with Autonomous Large Language Model Agents. Yoon et al. ICST. [paper] [repo]
[2023/10] RCAgent: Cloud Root Cause Analysis by Autonomous Agents with Tool-Augmented Large Language Models. Wang et al. arXiv. [paper]
[2023/10] Language Agent Tree Search Unifies Reasoning, Acting, and Planning in Language Models. Zhou et al. ICML. [paper] [repo]
[2023/10] AXNav: Replaying Accessibility Tests from Natural Language. Taeb et al. CHI. [paper]
[2023/09] CodePlan: Repository-level Coding using LLMs and Planning. Bairi et al. FSE. [paper] [repo]

Layered

[2024/04] Self-Organized Agents: A LLM Multi-Agent Framework toward Ultra Large-Scale Code Generation and Optimization. Ishibashi et al. arXiv. [paper] [repo]

Memory

Long-term Memory

[2024/06] Scaling Large-Language-Model-based Multi-Agent Collaboration Qian et al. arXiv. [paper] [repo]
[2024/06] Multi-Agent Software Development through Cross-Team Collaboration. Du et al. arXiv. [paper] [repo]
[2024/05] Iterative Experience Refinement of Software-Developing Agents. Qian et al. arXiv. [paper]
[2023/12] Experiential Co-Learning of Software-Developing Agents. Qian et al. ACL. [paper] [repo]
[2023/11] Intent-Driven Mobile GUI Testing with Autonomous Large Language Model Agents. Yoon et al. ICST. [paper] [repo]
[2023/09] AutoAgents: A Framework for Automatic Agent Generation. Chen et al. arXiv. [paper] [repo]
[2023/08] METAGPT: META PROGRAMMING FOR A MULTI-AGENT COLLABORATIVE FRAMEWORK. Hong et al. ICLR. [paper] [repo]
[2023/07] Communicative Agents for Software Development. Qian et al. ACL. [paper] [repo]
[2023/03] Reflexion: Language Agents with Verbal Reinforcement Learning. Shinn et al. NeurIPS. [paper] [repo]

Short-term Memory

[2024/07] Vision-driven Automated Mobile GUI Testing via Multimodal Large Language Model. Liu et al. arXiv. [paper] [repo]
[2024/06] Scaling Large-Language-Model-based Multi-Agent Collaboration Qian et al. arXiv. [paper] [repo]
[2024/06] Multi-Agent Software Development through Cross-Team Collaboration. Du et al. arXiv. [paper] [repo]
[2024/06] AgileCoder: Dynamic Collaborative Agents for Software Development based on Agile Methodology. Nguyen et al. arXiv. [paper] [repo]
[2024/04] Self-Organized Agents: A LLM Multi-Agent Framework toward Ultra Large-Scale Code Generation and Optimization. Ishibashi et al. arXiv. [paper] [repo]
[2024/03] MAGIS: LLM-Based Multi-Agent Framework for GitHub Issue ReSolution. Tao et al. arXiv. [paper]
[2024/01] XUAT-Copilot: Multi-Agent Collaborative System for Automated User Acceptance Testing with Large Language Model. Wang et al. arXiv. [paper]
[2023/12] E&V: Prompting Large Language Models to Perform Static Analysis by Pseudo-code Execution and Verification. Hao et al. arXiv. [paper]
[2023/11] Intent-Driven Mobile GUI Testing with Autonomous Large Language Model Agents. Yoon et al. ICST. [paper] [repo]
[2023/10] RCAgent: Cloud Root Cause Analysis by Autonomous Agents with Tool-Augmented Large Language Models. Wang et al. arXiv. [paper]
[2023/10] Static Code Analysis in the AI Era: An In-depth Exploration of the Concept, Function, and Potential of Intelligent Code Analysis. Fan et al. arXiv. [paper]
[2023/10] Make LLM a Testing Expert: Bringing Human-like Interaction to Mobile GUI Testing via Functionality-aware Decisions. Liu et al. ICSE. [paper]
[2023/09] CodePlan: Repository-level Coding using LLMs and Planning. Bairi et al. FSE. [paper] [repo]
[2023/09] AutoAgents: A Framework for Automatic Agent Generation. Chen et al. arXiv. [paper] [repo]
[2023/08] METAGPT: META PROGRAMMING FOR A MULTI-AGENT COLLABORATIVE FRAMEWORK. Hong et al. ICLR. [paper] [repo]
[2023/07] Communicative Agents for Software Development. Qian et al. ACL. [paper] [repo]
[2023/03] Reflexion: Language Agents with Verbal Reinforcement Learning. Shinn et al. NeurIPS. [paper] [repo]

Shared Memory: A special kind of Short-term Memory

[2024/06] AgileCoder: Dynamic Collaborative Agents for Software Development based on Agile Methodology. Nguyen et al. arXiv. [paper] [repo]
[2024/05] MARE: Multi-Agents Collaboration Framework for Requirements Engineering. Jin et al. arXiv. [paper]
[2024/03] When LLM-based Code Generation Meets the Software Development Process. Lin et al. arXiv. [paper] [repo]
[2024/03] AGENTFL: Scaling LLM-based Fault Localization to Project-Level Context. Qin et al. arXiv. [paper]
[2023/08] METAGPT: META PROGRAMMING FOR A MULTI-AGENT COLLABORATIVE FRAMEWORK. Hong et al. ICLR. [paper] [repo]
[2023/04] Self-collaboration Code Generation via ChatGPT. Dong et al. arXiv. [paper] [repo]

Perception

Visual Input

[2024/07] Vision-driven Automated Mobile GUI Testing via Multimodal Large Language Model. Liu et al. arXiv. [paper] [repo]
[2024/06] Experimenting with Multi-Agent Software Development: Towards a Unified Platform Sami et al. arXiv. [paper]
[2024/01] XUAT-Copilot: Multi-Agent Collaborative System for Automated User Acceptance Testing with Large Language Model. Wang et al. arXiv. [paper]
[2023/10] AXNav: Replaying Accessibility Tests from Natural Language. Taeb et al. CHI. [paper]
[2023/10] Testing the Limits: Unusual Text Inputs Generation for Mobile App Crash Detection with Large Language Model. Liu et al. ICSE. [paper] [repo]
[2023/08] METAGPT: META PROGRAMMING FOR A MULTI-AGENT COLLABORATIVE FRAMEWORK. Hong et al. ICLR. [paper] [repo]

Action

Searching Tools

[2024/05] Class-Level Code Generation from Natural Language Using Iterative, Tool-Enhanced Reasoning over Repository. Deshpande et al. arXiv. [paper]
[2024/04] LLM Agents can Autonomously Exploit One-day Vulnerabilities. Fang et al. arXiv. [paper]
[2024/03] AutoDev: Automated AI-Driven Development. Tufano et al. arXiv [paper]
[2024/03] RepairAgent: An Autonomous, LLM-Based Agent for Program Repair. Bouzenia et al. arXiv. [paper]
[2024/03] CoCoST: Automatic Complex Code Generation with Online Searching and Correctness Testing. He et al. arXiv. [paper]
[2024/03] RAT: Retrieval Augmented Thoughts Elicit Context-Aware Reasoning in Long-Horizon Generation. Wang et al. arXiv. [paper] [repo]
[2024/02] CodePori: Large Scale Model for Autonomous Software Development by Using Multi-Agents. Rasheed et al. arXiv. [paper]
[2024/01] LLM4Vuln: A Unified Evaluation Framework for Decoupling and Enhancing LLMs' Vulnerability Reasoning. Sun et al. arXiv. [paper] [repo]
[2024/01] CodeAgent: Enhancing Code Generation with Tool-Integrated Agent Systems for Real-World Repo-level Coding Challenges. Zhang et al. ACL. [paper]
[2023/12] E&V: Prompting Large Language Models to Perform Static Analysis by Pseudo-code Execution and Verification. Hao et al. arXiv. [paper]
[2023/12] Experiential Co-Learning of Software-Developing Agents. Qian et al. ACL. [paper] [repo]
[2023/11] Intent-Driven Mobile GUI Testing with Autonomous Large Language Model Agents. Yoon et al. ICST. [paper] [repo]
[2023/10] Lemur: Harmonizing Natural Language and Code for Language Agents. Xu et al. ICLR. [paper] [repo]
[2023/10] RCAgent: Cloud Root Cause Analysis by Autonomous Agents with Tool-Augmented Large Language Models. Wang et al. arXiv. [paper]
[2023/10] Static Code Analysis in the AI Era: An In-depth Exploration of the Concept, Function, and Potential of Intelligent Code Analysis. Fan et al. arXiv. [paper]
[2023/08] METAGPT: META PROGRAMMING FOR A MULTI-AGENT COLLABORATIVE FRAMEWORK. Hong et al. ICLR. [paper] [repo]
[2023/08] PENTESTGPT: An LLM-empowered Automatic Penetration Testing Tool. Deng et al. arXiv. [paper] [repo]
[2023/08] AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors. Chen et al. ICLR. [paper] [repo]
[2023/08] Gentopia: A Collaborative Platform for Tool-Augmented LLMs. Xu et al. EMNLP. [paper] [repo]
[2023/08] AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation. Wu et al. arXiv. [paper] [repo]
[2023/05] ToolCoder: Teach Code Generation Models to use API search tools. Zhang et al. arXiv. [paper]
[2023/03] ART: Automatic multi-step reasoning and tool-use for large language models. Paranjape et al. arXiv. [paper] [repo]

File Operation

[2024/08] SpecRover: Code Intent Extraction via LLMs Ruan et al. arXiv. [paper] [repo]
[2024/06] MASAI: Modular Architecture for Software-engineering AI Agents. Arora et al. arXiv. [paper]
[2024/05] LLM-Assisted Static Analysis for Detecting Security Vulnerabilities. Li et al. arXiv. [paper]
[2024/05] SWE-AGENT: AGENT-COMPUTER INTERFACES ENABLE AUTOMATED SOFTWARE ENGINEERING. Yang et al. arXiv. [paper] [repo]
[2024/04] LLM Agents can Autonomously Exploit One-day Vulnerabilities. Fang et al. arXiv. [paper]
[2024/03] RepairAgent: An Autonomous, LLM-Based Agent for Program Repair. Bouzenia et al. arXiv. [paper]
[2024/03] AutoDev: Automated AI-Driven Development. Tufano et al. arXiv [paper]
[2023/04] Explainable Automated Debugging via Large Language Model-driven Scientific Debugging. Kang et al. arXiv. [paper]

GUI Operation

[2024/07] Vision-driven Automated Mobile GUI Testing via Multimodal Large Language Model. Liu et al. arXiv. [paper] [repo]
[2024/01] XUAT-Copilot: Multi-Agent Collaborative System for Automated User Acceptance Testing with Large Language Model. Wang et al. arXiv. [paper]
[2023/10] Make LLM a Testing Expert: Bringing Human-like Interaction to Mobile GUI Testing via Functionality-aware Decisions. Liu et al. ICSE. [paper]
[2023/10] AXNav: Replaying Accessibility Tests from Natural Language. Taeb et al. CHI. [paper]
[2023/10] Testing the Limits: Unusual Text Inputs Generation for Mobile App Crash Detection with Large Language Model. Liu et al. ICSE. [paper] [repo]
[2023/06] Prompting Is All You Need: Automated Android Bug Replay with Large Language Models. Feng et al. ICSE. [paper] [repo]

Static Program Analysis

[2024/06] Multi-Agent Software Development through Cross-Team Collaboration. Du et al. arXiv. [paper] [repo]
[2024/06] AgileCoder: Dynamic Collaborative Agents for Software Development based on Agile Methodology. Nguyen et al. arXiv. [paper] [repo]
[2024/06] MASAI: Modular Architecture for Software-engineering AI Agents. Arora et al. arXiv. [paper]
[2024/05] Class-Level Code Generation from Natural Language Using Iterative, Tool-Enhanced Reasoning over Repository. Deshpande et al. arXiv. [paper]
[2024/05] LLM-Assisted Static Analysis for Detecting Security Vulnerabilities. Li et al. arXiv. [paper]
[2024/04] AutoCodeRover: Autonomous Program Improvement. Zhang et al. ISSTA. [paper] [repo]
[2024/04] Enhancing LLM-based Test Generation for Hard-to-Cover Branches via Program Analysis. Yang et al. arXiv. [paper]
[2024/04] 3DGen: AI-Assisted Generation of Provably Correct Binary Format Parsers. Fakhoury et al. arXiv [paper]
[2024/03] AutoDev: Automated AI-Driven Development. Tufano et al. arXiv [paper]
[2024/03] RepairAgent: An Autonomous, LLM-Based Agent for Program Repair. Bouzenia et al. arXiv. [paper]
[2024/03] COVERUP: Coverage-Guided LLM-Based Test Generation. Pizzorno et al. arXiv. [paper] [repo]
[2024/03] ACFIX: Guiding LLMs with Mined Common RBAC Practices for Context-Aware Repair of Access Control Vulnerabilities in Smart Contracts. Zhang et al. arXiv. [paper]
[2024/03] AGENTFL: Scaling LLM-based Fault Localization to Project-Level Context. Qin et al. arXiv. [paper]
[2024/02] When Dataflow Analysis Meets Large Language Models. Wang et al. arXiv. [paper]
[2024/02] LDB: A Large Language Model Debugger via Verifying Runtime Execution Step by Step. Zhong et al. arXiv. [paper] [repo]
[2024/01] Teaching Code LLMs to Use Autocompletion Tools in Repository-Level Code Generation. Wang et al. arXiv. [paper]
[2024/01] CodeAgent: Enhancing Code Generation with Tool-Integrated Agent Systems for Real-World Repo-level Coding Challenges. Zhang et al. ACL. [paper]
[2024/01] LLM4PLC: Harnessing Large Language Models for Verifiable Programming of PLCs in Industrial Control Systems. Fakih et al. ICSE. [paper] [repo]
[2023/12] E&V: Prompting Large Language Models to Perform Static Analysis by Pseudo-code Execution and Verification. Hao et al. arXiv. [paper]
[2023/09] CodePlan: Repository-level Coding using LLMs and Planning. Bairi et al. FSE. [paper] [repo]
[2023/08] CodeCoT: Tackling Code Syntax Errors in CoT Reasoning for Code Generation. Huang et al. arXiv. [paper]
[2023/07] Isolating Compiler Bugs by Generating Effective Witness Programs with Large Language Models. Tu et al. IEEE Trans. Software Eng.. [paper] [repo]
[2023/06] Prompting Is All You Need: Automated Android Bug Replay with Large Language Models. Feng et al. ICSE. [paper] [repo]

Dynamic Analysis

[2024/04] Enhancing LLM-based Test Generation for Hard-to-Cover Branches via Program Analysis. Yang et al. arXiv. [paper]
[2024/03] COVERUP: Coverage-Guided LLM-Based Test Generation. Pizzorno et al. arXiv. [paper] [repo]
[2024/03] AGENTFL: Scaling LLM-based Fault Localization to Project-Level Context. Qin et al. arXiv. [paper]
[2024/02] LDB: A Large Language Model Debugger via Verifying Runtime Execution Step by Step. Zhong et al. arXiv. [paper] [repo]
[2023/07] Isolating Compiler Bugs by Generating Effective Witness Programs with Large Language Models. Tu et al. IEEE Trans. Software Eng.. [paper] [repo]
[2023/04] Explainable Automated Debugging via Large Language Model-driven Scientific Debugging. Kang et al. arXiv. [paper]

Testing Tools

[2024/09] Neurosymbolic Repair of Test Flakiness. Chen et al. ISSTA. [paper]
[2024/08] SpecRover: Code Intent Extraction via LLMs Ruan et al. arXiv. [paper] [repo]
[2024/06] AgileCoder: Dynamic Collaborative Agents for Software Development based on Agile Methodology. Nguyen et al. arXiv. [paper] [repo]
[2024/06] MASAI: Modular Architecture for Software-engineering AI Agents. Arora et al. arXiv. [paper]
[2024/05] AutoCoder: Enhancing Code Large Language Model with AIEV-INSTRUCT. Lei et al. arXiv. [paper] [repo]
[2024/05] MapCoder: Multi-Agent Code Generation for Competitive Problem Solving. Islam et al. ACL. [paper] [repo]
[2024/04] Self-Organized Agents: A LLM Multi-Agent Framework toward Ultra Large-Scale Code Generation and Optimization. Ishibashi et al. arXiv. [paper] [repo]
[2024/04] A Unified Debugging Approach via LLM-Based Multi-Agent Synergy. Lee et al. arXiv. [paper] [repo]
[2024/04] LLM Agents can Autonomously Exploit One-day Vulnerabilities. Fang et al. arXiv. [paper]
[2024/04] Enhancing LLM-based Test Generation for Hard-to-Cover Branches via Program Analysis. Yang et al. arXiv. [paper]
[2024/04] 3DGen: AI-Assisted Generation of Provably Correct Binary Format Parsers. Fakhoury et al. arXiv [paper]
[2024/04] AutoCodeRover: Autonomous Program Improvement. Zhang et al. ISSTA. [paper] [repo]
[2024/03] AutoDev: Automated AI-Driven Development. Tufano et al. arXiv [paper]
[2024/03] When LLM-based Code Generation Meets the Software Development Process. Lin et al. arXiv. [paper] [repo]
[2024/03] RepairAgent: An Autonomous, LLM-Based Agent for Program Repair. Bouzenia et al. arXiv. [paper]
[2024/03] CoCoST: Automatic Complex Code Generation with Online Searching and Correctness Testing. He et al. arXiv. [paper]
[2024/02] Executable Code Actions Elicit Better LLM Agents. Wang et al. ICML. [paper] [repo]
[2024/02] Test-Driven Development for Code Generation. Mathews et al. arXiv. [paper] [repo]
[2024/01] Code Generation with AlphaCodium: From Prompt Engineering to Flow Engineering. Ridnik et al. arXiv. [paper] [repo]
[2024/01] Experimenting a New Programming Practice with LLMs. Zhang et al. arXiv. [paper] [repo]
[2024/01] CodeAgent: Enhancing Code Generation with Tool-Integrated Agent Systems for Real-World Repo-level Coding Challenges. Zhang et al. ACL. [paper]
[2023/12] AgentCoder: Multi-Agent-based Code Generation with Iterative Testing and Optimisation. Huang et al. arXiv. [paper]
[2023/12] LLM4TDD: Best Practices for Test Driven Development Using Large Language Models. Piya et al. arXiv. [paper] [repo]
[2023/11] INTERVENOR: Prompting the Coding Ability of Large Language Models with the Interactive Chain of Repair. Wang et al. ACL. [paper] [repo]
[2023/10] ClarifyGPT: Empowering LLM-based Code Generation with Intention Clarification. Mu et al. arXiv. [paper] [repo]
[2023/10] Lemur: Harmonizing Natural Language and Code for Language Agents. Xu et al. ICLR. [paper] [repo]
[2023/10] White-box Compiler FuzzingEmpowered by Large Language Models. Yang et al. arXiv. [paper] [repo]
[2023/09] Test-Case-Driven Programming Understanding in Large Language Models for Better Code Generation. Tian et al. arXiv. [paper]
[2023/09] MINT: EVALUATING LLMS IN MULTI-TURN INTERACTION WITH TOOLS AND LANGUAGE FEEDBACK. Wang et al. ICLR. [paper] [repo]
[2023/08] METAGPT: META PROGRAMMING FOR A MULTI-AGENT COLLABORATIVE FRAMEWORK. Hong et al. ICLR. [paper] [repo]
[2023/08] Effective Test Generation Using Pre-trained Large Language Models and Mutation Testing. Dakhel et al. Inf. Softw. Technol. . [paper] [repo]
[2023/08] AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors. Chen et al. ICLR. [paper] [repo]
[2023/08] AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation. Wu et al. arXiv. [paper] [repo]
[2023/08] Flows: Building Blocks of Reasoning and Collaborating AI. Josifoski et al. arXiv. [paper] [repo]
[2023/06] SELFEVOLVE: A Code Evolution Framework via Large Language Models. Jiang et al. arXiv. [paper]
[2023/06] InterCode: Standardizing and Benchmarking Interactive Coding with Execution Feedback. Yang et al. NeurIPS. [paper] [repo]
[2023/06] IS SELF-REPAIR A SILVER BULLET FOR CODE GENERATION?. Olausson et al. ICLR. [paper] [repo]
[2023/05] No More Manual Tests? Evaluating and Improving ChatGPT for Unit Test Generation. Yuan et al. arXiv. [paper] [repo]
[2023/04] Fully Autonomous Programming with Large Language Models. Liventsev et al. GECCO. [paper]
[2023/04] Explainable Automated Debugging via Large Language Model-driven Scientific Debugging. Kang et al. arXiv. [paper]
[2023/03] ART: Automatic multi-step reasoning and tool-use for large language models. Paranjape et al. arXiv. [paper] [repo]
[2023/03] Reflexion: Language Agents with Verbal Reinforcement Learning. Shinn et al. NeurIPS. [paper] [repo]
[2023/02] An Empirical Evaluation of Using Large Language Models for Automated Unit Test Generation. Schäfer et al. IEEE Trans. Software Eng.. [paper] [repo]
[2023/01] Conversational Automated Program Repair. Xia et al. arXiv. [paper]

Fault Localization Tools

[2024/04] AutoCodeRover: Autonomous Program Improvement. Zhang et al. ISSTA. [paper] [repo]
[2024/03] RepairAgent: An Autonomous, LLM-Based Agent for Program Repair. Bouzenia et al. arXiv. [paper]

Multi-agent System

Agent Roles

Manager Roles

[2024/06] AgileCoder: Dynamic Collaborative Agents for Software Development based on Agile Methodology. Nguyen et al. arXiv. [paper] [repo]
[2024/05] Iterative Experience Refinement of Software-Developing Agents. Qian et al. arXiv. [paper]
[2024/05] MapCoder: Multi-Agent Code Generation for Competitive Problem Solving. Islam et al. ACL. [paper] [repo]
[2024/04] Self-Organized Agents: A LLM Multi-Agent Framework toward Ultra Large-Scale Code Generation and Optimization. Ishibashi et al. arXiv. [paper] [repo]
[2024/04] 3DGen: AI-Assisted Generation of Provably Correct Binary Format Parsers. Fakhoury et al. arXiv [paper]
[2024/03] MAGIS: LLM-Based Multi-Agent Framework for GitHub Issue ReSolution. Tao et al. arXiv. [paper]
[2024/03] When LLM-based Code Generation Meets the Software Development Process. Lin et al. arXiv. [paper] [repo]
[2024/02] CodeAgent: Collaborative Agents for Software Engineering. Tang et al. arXiv. [paper] [repo]
[2024/02] CodePori: Large Scale Model for Autonomous Software Development by Using Multi-Agents. Rasheed et al. arXiv. [paper]
[2023/12] Experiential Co-Learning of Software-Developing Agents. Qian et al. ACL. [paper] [repo]
[2023/11] Autonomous Agents in Software Development: A Vision Paper Rasheed et al. arXiv. [paper]
[2023/11] Intent-Driven Mobile GUI Testing with Autonomous Large Language Model Agents. Yoon et al. ICST. [paper] [repo]
[2023/10] AXNav: Replaying Accessibility Tests from Natural Language. Taeb et al. CHI. [paper]
[2023/10] RCAgent: Cloud Root Cause Analysis by Autonomous Agents with Tool-Augmented Large Language Models. Wang et al. arXiv. [paper]
[2023/09] AutoAgents: A Framework for Automatic Agent Generation. Chen et al. arXiv. [paper] [repo]
[2023/08] METAGPT: META PROGRAMMING FOR A MULTI-AGENT COLLABORATIVE FRAMEWORK. Hong et al. ICLR. [paper] [repo]
[2023/04] Low-code LLM: Visual Programming over LLMs. Cai et al. arXiv. [paper] [repo]
[2023/03] CAMEL: Communicative Agents for “Mind” Exploration of Large Language Model Society. Li et al. NeurIPS. [paper] [repo]

Requirement Analyzing Roles

[2024/06] Experimenting with Multi-Agent Software Development: Towards a Unified Platform Sami et al. arXiv. [paper]
[2024/06] AgileCoder: Dynamic Collaborative Agents for Software Development based on Agile Methodology. Nguyen et al. arXiv. [paper] [repo]
[2024/05] MARE: Multi-Agents Collaboration Framework for Requirements Engineering. Jin et al. arXiv. [paper]
[2024/04] Elicitron: An LLM Agent-Based Simulation Framework for Design Requirements Elicitation. Ataei et al. arXiv. [paper]
[2024/03] When LLM-based Code Generation Meets the Software Development Process. Lin et al. arXiv. [paper] [repo]
[2024/01] Experimenting a New Programming Practice with LLMs. Zhang et al. arXiv. [paper] [repo]
[2023/11] Autonomous Agents in Software Development: A Vision Paper Rasheed et al. arXiv. [paper]
[2023/10] Static Code Analysis in the AI Era: An In-depth Exploration of the Concept, Function, and Potential of Intelligent Code Analysis. Fan et al. arXiv. [paper]
[2023/08] METAGPT: META PROGRAMMING FOR A MULTI-AGENT COLLABORATIVE FRAMEWORK. Hong et al. ICLR. [paper] [repo]
[2023/06] MULTI-AGENT COLLABORATION: HARNESSING THE POWER OF INTELLIGENT LLM AGENTS. Talebirad et al. arXiv. [paper]
[2023/04] Self-collaboration Code Generation via ChatGPT. Dong et al. arXiv. [paper] [repo]
[2023/03] CAMEL: Communicative Agents for “Mind” Exploration of Large Language Model Society. Li et al. NeurIPS. [paper] [repo]

Designer Roles

[2024/06] Experimenting with Multi-Agent Software Development: Towards a Unified Platform Sami et al. arXiv. [paper]
[2024/03] When LLM-based Code Generation Meets the Software Development Process. Lin et al. arXiv. [paper] [repo]
[2024/01] Experimenting a New Programming Practice with LLMs. Zhang et al. arXiv. [paper] [repo]
[2023/11] Autonomous Agents in Software Development: A Vision Paper Rasheed et al. arXiv. [paper]
[2023/08] METAGPT: META PROGRAMMING FOR A MULTI-AGENT COLLABORATIVE FRAMEWORK. Hong et al. ICLR. [paper] [repo]
[2023/08] AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors. Chen et al. ICLR. [paper] [repo]
[2023/07] Communicative Agents for Software Development. Qian et al. ACL. [paper] [repo]
[2023/06] MULTI-AGENT COLLABORATION: HARNESSING THE POWER OF INTELLIGENT LLM AGENTS. Talebirad et al. arXiv. [paper]

Developer Roles

[2024/06] Experimenting with Multi-Agent Software Development: Towards a Unified Platform Sami et al. arXiv. [paper]
[2024/06] AgileCoder: Dynamic Collaborative Agents for Software Development based on Agile Methodology. Nguyen et al. arXiv. [paper] [repo]
[2024/05] AutoCoder: Enhancing Code Large Language Model with AIEV-INSTRUCT. Lei et al. arXiv. [paper] [repo]
[2024/05] MapCoder: Multi-Agent Code Generation for Competitive Problem Solving. Islam et al. ACL. [paper] [repo]
[2024/04] Self-Organized Agents: A LLM Multi-Agent Framework toward Ultra Large-Scale Code Generation and Optimization. Ishibashi et al. arXiv. [paper] [repo]
[2024/04] 3DGen: AI-Assisted Generation of Provably Correct Binary Format Parsers. Fakhoury et al. arXiv [paper]
[2024/03] CodeS: Natural Language to Code Repository via Multi-Layer Sketch. Zan et al. arXiv. [paper] [repo]
[2024/03] MAGIS: LLM-Based Multi-Agent Framework for GitHub Issue ReSolution. Tao et al. arXiv. [paper]
[2024/03] When LLM-based Code Generation Meets the Software Development Process. Lin et al. arXiv. [paper] [repo]
[2024/02] Test-Driven Development for Code Generation. Mathews et al. arXiv. [paper] [repo]
[2024/02] CodePori: Large Scale Model for Autonomous Software Development by Using Multi-Agents. Rasheed et al. arXiv. [paper]
[2024/01] Experimenting a New Programming Practice with LLMs. Zhang et al. arXiv. [paper] [repo]
[2023/12] AgentCoder: Multi-Agent-based Code Generation with Iterative Testing and Optimisation. Huang et al. arXiv. [paper]
[2023/11] Autonomous Agents in Software Development: A Vision Paper Rasheed et al. arXiv. [paper]
[2023/11] INTERVENOR: Prompting the Coding Ability of Large Language Models with the Interactive Chain of Repair. Wang et al. ACL. [paper] [repo]
[2023/08] AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation. Wu et al. arXiv. [paper] [repo]
[2023/08] METAGPT: META PROGRAMMING FOR A MULTI-AGENT COLLABORATIVE FRAMEWORK. Hong et al. ICLR. [paper] [repo]
[2023/08] AgentVerse: Facilitating Multi-Agent Collaboration and Exploring Emergent Behaviors. Chen et al. ICLR. [paper] [repo]
[2023/07] Communicative Agents for Software Development. Qian et al. ACL. [paper] [repo]
[2023/06] IS SELF-REPAIR A SILVER BULLET FOR CODE GENERATION?. Olausson et al. ICLR. [paper] [repo]
[2023/06] MULTI-AGENT COLLABORATION: HARNESSING THE POWER OF INTELLIGENT LLM AGENTS. Talebirad et al. arXiv. [paper]
[2023/05] Self-Edit: Fault-Aware Code Editor for Code Generation. Zhang et al. ACL. [paper]
[2023/04] Self-collaboration Code Generation via ChatGPT. Dong et al. arXiv. [paper] [repo]
[2023/03] CAMEL: Communicative Agents for “Mind” Exploration of Large Language Model Society. Li et al. NeurIPS. [paper] [repo]

Software Quality Assurance Roles

[2024/08] SpecRover: Code Intent Extraction via LLMs Ruan et al. arXiv. [paper] [repo]
[2024/07] Vision-driven Automated Mobile GUI Testing via Multimodal Large Language Model. Liu et al. arXiv. [paper] [repo]
[2024/06] Experimenting with Multi-Agent Software Development: Towards a Unified Platform Sami et al. arXiv. [paper]
[2024/06] Multi-Agent Software Development through Cross-Team Collaboration. Du et al. arXiv. [paper] [repo]
[2024/06] AgileCoder: Dynamic Collaborative Agents for Software Development based on Agile Methodology. Nguyen et al. arXiv. [paper] [repo]
[2024/06] MASAI: Modular Architecture for Software-engineering AI Agents. Arora et al. arXiv. [paper]
[2024/05] AutoCoder: Enhancing Code Large Language Model with AIEV-INSTRUCT. Lei et al. arXiv. [paper] [repo]
[2024/05] MapCoder: Multi-Agent Code Generation for Competitive Problem Solving. Islam et al. ACL. [paper] [repo]
[2024/04] AI-powered Code Review with LLMs: Early Results. Rasheed et al. arXiv. [paper]
[2024/04] 3DGen: AI-Assisted Generation of Provably Correct Binary Format Parsers. Fakhoury et al. arXiv [paper]
[2024/04] A Unified Debugging Approach via LLM-Based Multi-Agent Synergy. Lee et al. arXiv. [paper] [repo]
[2024/04] How Far Can We Go with Practical Function-Level Program Repair?. Xiang et al. arXiv. [paper] [repo]
[2024/03] MAGIS: LLM-Based Multi-Agent Framework for GitHub Issue ReSolution. Tao et al. arXiv. [paper]
[2024/03] AGENTFL: Scaling LLM-based Fault Localization to Project-Level Context. Qin et al. arXiv. [paper]
[2024/03] Combining Fine-tuning and LLM-based Agents for Intuitive Smart Contract Auditing with Justifications. Ma et al. arXiv. [paper] [repo]
[2024/03] When LLM-based Code Generation Meets the Software Development Process. Lin et al. arXiv. [paper] [repo]
[2024/03] ACFIX: Guiding LLMs with Mined Common RBAC Practices for Context-Aware Repair of Access Control Vulnerabilities in Smart Contracts. Zhang et al. arXiv. [paper]
[2024/02] CodeAgent: Collaborative Agents for Software Engineering. Tang et al. arXiv. [paper] [repo]
[2024/02] Test-Driven Development for Code Generation. Mathews et al. arXiv. [paper] [repo]
[2024/02] CodePori: Large Scale Model for Autonomous Software Development by Using Multi-Agents. Rasheed et al. arXiv. [paper]
[2024/01] XUAT-Copilot: Multi-Agent Collaborative System for Automated User Acceptance Testing with Large Language Model. Wang et al. arXiv. [paper]
[2023/12] AgentCoder: Multi-Agent-based Code Generation with Iterative Testing and Optimisation. Huang et al. arXiv. [paper]
[2023/11] Autonomous Agents in Software Development: A Vision Paper Rasheed et al. arXiv. [paper]
[2023/11] Intent-Driven Mobile GUI Testing with Autonomous Large Language Model Agents. Yoon et al. ICST. [paper] [repo]
[2023/10] Large Language Model-Powered Smart Contract Vulnerability Detection: New Perspectives. Hu et al. TPS-ISA. [paper] [repo]
[2023/10] Static Code Analysis in the AI Era: An In-depth Exploration of the Concept, Function, and Potential of Intelligent Code Analysis. Fan et al. arXiv. [paper]
[2023/10] White-box Compiler FuzzingEmpowered by Large Language Models. Yang et al. arXiv. [paper] [repo]
[2023/10] AXNav: Replaying Accessibility Tests from Natural Language. Taeb et al. CHI. [paper]
[2023/08] AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation. Wu et al. arXiv. [paper] [repo]
[2023/08] METAGPT: META PROGRAMMING FOR A MULTI-AGENT COLLABORATIVE FRAMEWORK. Hong et al. ICLR. [paper] [repo]
[2023/07] Communicative Agents for Software Development. Qian et al. ACL. [paper] [repo]
[2023/06] IS SELF-REPAIR A SILVER BULLET FOR CODE GENERATION?. Olausson et al. ICLR. [paper] [repo]
[2023/06] MULTI-AGENT COLLABORATION: HARNESSING THE POWER OF INTELLIGENT LLM AGENTS. Talebirad et al. arXiv. [paper]
[2023/05] Self-Edit: Fault-Aware Code Editor for Code Generation. Zhang et al. ACL. [paper]
[2023/03] CAMEL: Communicative Agents for “Mind” Exploration of Large Language Model Society. Li et al. NeurIPS. [paper] [repo]

Assistant Roles

[2024/08] DIVERSITY EMPOWERS INTELLIGENCE:INTEGRAT-ING EXPERTISE OF SOFTWARE ENGINEERING AGENTS Zhang et al. arXiv. [paper]
[2024/08] SpecRover: Code Intent Extraction via LLMs Ruan et al. arXiv. [paper] [repo]
[2024/06] MASAI: Modular Architecture for Software-engineering AI Agents. Arora et al. arXiv. [paper]
[2024/05] MapCoder: Multi-Agent Code Generation for Competitive Problem Solving. Islam et al. ACL. [paper] [repo]
[2024/03] MAGIS: LLM-Based Multi-Agent Framework for GitHub Issue ReSolution. Tao et al. arXiv. [paper]
[2024/03] CodeS: Natural Language to Code Repository via Multi-Layer Sketch. Zan et al. arXiv. [paper] [repo]
[2024/03] Combining Fine-tuning and LLM-based Agents for Intuitive Smart Contract Auditing with Justifications. Ma et al. arXiv. [paper] [repo]
[2023/10] Static Code Analysis in the AI Era: An In-depth Exploration of the Concept, Function, and Potential of Intelligent Code Analysis. Fan et al. arXiv. [paper]

Collaboration Mechanism

Layered Structure

[2024/08] DIVERSITY EMPOWERS INTELLIGENCE:INTEGRAT-ING EXPERTISE OF SOFTWARE ENGINEERING AGENTS Zhang et al. arXiv. [paper]
[2024/08] SpecRover: Code Intent Extraction via LLMs Ruan et al. arXiv. [paper] [repo]
[2024/06] Experimenting with Multi-Agent Software Development: Towards a Unified Platform Sami et al. arXiv. [paper]
[2024/06] Scaling Large-Language-Model-based Multi-Agent Collaboration Qian et al. arXiv. [paper] [repo]
[2024/06] Multi-Agent Software Development through Cross-Team Collaboration. Du et al. arXiv. [paper] [repo]
[2024/06] AgileCoder: Dynamic Collaborative Agents for Software Development based on Agile Methodology. Nguyen et al. arXiv. [paper] [repo]
[2024/05] MapCoder: Multi-Agent Code Generation for Competitive Problem Solving. Islam et al. ACL. [paper] [repo]
[2024/05] MARE: Multi-Agents Collaboration Framework for Requirements Engineering. Jin et al. arXiv. [paper]
[2024/04] AutoCodeRover: Autonomous Program Improvement. Zhang et al. ISSTA. [paper] [repo]
[2024/04] How Far Can We Go with Practical Function-Level Program Repair?. Xiang et al. arXiv. [paper] [repo]
[2024/03] CodeS: Natural Language to Code Repository via Multi-Layer Sketch. Zan et al. arXiv. [paper] [repo]
[2024/03] When LLM-based Code Generation Meets the Software Development Process. Lin et al. arXiv. [paper] [repo]
[2024/03] AGENTFL: Scaling LLM-based Fault Localization to Project-Level Context. Qin et al. arXiv. [paper]
[2024/02] When Dataflow Analysis Meets Large Language Models. Wang et al. arXiv. [paper]
[2024/02] CodeAgent: Collaborative Agents for Software Engineering. Tang et al. arXiv. [paper] [repo]
[2024/02] More Agents Is All You Need. Li et al. arXiv. [paper]
[2024/01] Experimenting a New Programming Practice with LLMs. Zhang et al. arXiv. [paper] [repo]
[2023/11] Autonomous Agents in Software Development: A Vision Paper Rasheed et al. arXiv. [paper]
[2023/10] Static Code Analysis in the AI Era: An In-depth Exploration of the Concept, Function, and Potential of Intelligent Code Analysis. Fan et al. arXiv. [paper]
[2023/10] Large Language Model-Powered Smart Contract Vulnerability Detection: New Perspectives. Hu et al. TPS-ISA. [paper] [repo]
[2023/10] White-box Compiler FuzzingEmpowered by Large Language Models. Yang et al. arXiv. [paper] [repo]
[2023/10] Dynamic LLM-Agent Network: An LLM-agent Collaboration Framework with Agent Team Optimization. Liu et al. arXiv. [paper] [repo]
[2023/08] METAGPT: META PROGRAMMING FOR A MULTI-AGENT COLLABORATIVE FRAMEWORK. Hong et al. ICLR. [paper] [repo]
[2023/08] Flows: Building Blocks of Reasoning and Collaborating AI. Josifoski et al. arXiv. [paper] [repo]
[2023/07] Communicative Agents for Software Development. Qian et al. ACL. [paper] [repo]
[2023/05] Self-Edit: Fault-Aware Code Editor for Code Generation. Zhang et al. ACL. paper
[2023/04] Low-code LLM: Visual Programming over LLMs. Cai et al. arXiv. [paper] [repo]

Circular Structure

[2024/05] AutoCoder: Enhancing Code Large Language Model with AIEV-INSTRUCT. Lei et al. arXiv. [paper] [repo]
[2024/04] A Unified Debugging Approach via LLM-Based Multi-Agent Synergy. Lee et al. arXiv. [paper] [repo]
[2024/03] ACFIX: Guiding LLMs with Mined Common RBAC Practices for Context-Aware Repair of Access Control Vulnerabilities in Smart Contracts. Zhang et al. arXiv. [paper]
[2024/03] Multi-role Consensus through LLMs Discussions for Vulnerability Detection. Mao et al. QRS. [paper]
[2024/03] Combining Fine-tuning and LLM-based Agents for Intuitive Smart Contract Auditing with Justifications. Ma et al. arXiv. [paper] [repo]
[2024/02] Test-Driven Development for Code Generation. Mathews et al. arXiv. [paper] [repo]
[2024/02] CodePori: Large Scale Model for Autonomous Software Development by Using Multi-Agents. Rasheed et al. arXiv. [paper]
[2023/12] Experiential Co-Learning of Software-Developing Agents. Qian et al. ACL. [paper] [repo]
[2023/12] AgentCoder: Multi-Agent-based Code Generation with Iterative Testing and Optimisation. Huang et al. arXiv. [paper]
[2023/11] INTERVENOR: Prompting the Coding Ability of Large Language Models with the Interactive Chain of Repair. Wang et al. ACL. [paper] [repo]
[2023/11] Intent-Driven Mobile GUI Testing with Autonomous Large Language Model Agents. Yoon et al. ICST. [paper] [repo]
[2023/10] AXNav: Replaying Accessibility Tests from Natural Language. Taeb et al. CHI. [paper]
[2023/06] IS SELF-REPAIR A SILVER BULLET FOR CODE GENERATION?. Olausson et al. ICLR. [paper] [repo]
[2023/03] CAMEL: Communicative Agents for “Mind” Exploration of Large Language Model Society. Li et al. NeurIPS. [paper] [repo]
[2023/03] Reflexion: Language Agents with Verbal Reinforcement Learning. Shinn et al. NeurIPS. [paper] [repo]

Tree-like Structure

[2024/06] Scaling Large-Language-Model-based Multi-Agent Collaboration Qian et al. arXiv. [paper] [repo]
[2024/06] MASAI: Modular Architecture for Software-engineering AI Agents. Arora et al. arXiv. [paper]
[2024/04] Self-Organized Agents: A LLM Multi-Agent Framework toward Ultra Large-Scale Code Generation and Optimization. Ishibashi et al. arXiv. [paper] [repo]

Star-like Structure

[2024/06] Scaling Large-Language-Model-based Multi-Agent Collaboration Qian et al. arXiv. [paper] [repo]
[2024/03] AutoDev: Automated AI-Driven Development. Tufano et al. arXiv [paper]
[2024/01] XUAT-Copilot: Multi-Agent Collaborative System for Automated User Acceptance Testing with Large Language Model. Wang et al. arXiv. [paper]
[2023/10] RCAgent: Cloud Root Cause Analysis by Autonomous Agents with Tool-Augmented Large Language Models. Wang et al. arXiv. [paper]
[2023/08] AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation. Wu et al. arXiv. [paper] [repo]

Mesh Structure

[2024/06] Scaling Large-Language-Model-based Multi-Agent Collaboration Qian et al. arXiv. [paper] [repo]
[2024/04] 3DGen: AI-Assisted Generation of Provably Correct Binary Format Parsers. Fakhoury et al. arXiv [paper]

Human-Agent Collaboration

Planning Phase

[2024/01] Experimenting a New Programming Practice with LLMs. Zhang et al. arXiv. [paper] [repo]
[2024/01] LLM4PLC: Harnessing Large Language Models for Verifiable Programming of PLCs in Industrial Control Systems. Fakih et al. ICSE. [paper] [repo]
[2023/10] Static Code Analysis in the AI Era: An In-depth Exploration of the Concept, Function, and Potential of Intelligent Code Analysis. Fan et al. arXiv. [paper]
[2023/04] Low-code LLM: Visual Programming over LLMs. Cai et al. arXiv. [paper] [repo]

Requirements Phase

[2024/05] MARE: Multi-Agents Collaboration Framework for Requirements Engineering. Jin et al. arXiv. [paper]
[2024/02] Executable Code Actions Elicit Better LLM Agents. Wang et al. ICML. [paper] [repo]
[2024/01] Experimenting a New Programming Practice with LLMs. Zhang et al. arXiv. [paper] [repo]
[2023/10] ClarifyGPT: Empowering LLM-based Code Generation with Intention Clarification. Mu et al. arXiv. [paper] [repo]
[2023/06] Prompt Sapper: LLM-Empowered Software Engineering Infrastructure for AI-Native Services. Xing et al. arXiv. [paper]

Development Phase

[2024/03] CodeS: Natural Language to Code Repository via Multi-Layer Sketch. Zan et al. arXiv. [paper] [repo]
[2024/01] LLM4PLC: Harnessing Large Language Models for Verifiable Programming of PLCs in Industrial Control Systems. Fakih et al. ICSE. [paper] [repo]
[2023/09] MINT: EVALUATING LLMS IN MULTI-TURN INTERACTION WITH TOOLS AND LANGUAGE FEEDBACK. Wang et al. ICLR. [paper] [repo]
[2023/08] Flows: Building Blocks of Reasoning and Collaborating AI. Josifoski et al. arXiv. [paper] [repo]
[2023/08] AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation. Wu et al. arXiv. [paper] [repo]

Evaluation Phase

[2024/01] Experimenting a New Programming Practice with LLMs. Zhang et al. arXiv. [paper] [repo]
[2023/08] Gentopia: A Collaborative Platform for Tool-Augmented LLMs. Xu et al. EMNLP. [paper] [repo]
[2023/06] Prompt Sapper: LLM-Empowered Software Engineering Infrastructure for AI-Native Services. Xing et al. arXiv. [paper]
[2023/03] ART: Automatic multi-step reasoning and tool-use for large language models. Paranjape et al. arXiv. [paper] [repo]

📝 Citation

@misc{Agent4SE,
      title={Large Language Model-Based Agents for Software Engineering: A Survey}, 
      author={Junwei Liu and Kaixin Wang and Yixuan Chen and Xin Peng and Zhenpeng Chen and Lingming Zhang and Yiling Lou},
      year={2024},
      eprint={2409.02977},
      archivePrefix={arXiv},
      primaryClass={cs.SE},
      url={https://arxiv.org/abs/2409.02977}, 
}

👨🏻‍💻 Maintainers

Junwei Liu @To-D
Kaixin Wang @wkx228
Yixuan Chen @FloridaSpidee

📬 Contact Us

Feel free to ask any questions or provide us with some suggestions via:

Junwei Liu: [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 58 Commits
assets		assets
README.md		README.md

FudanSELab/Agent4SE-Paper-List

Folders and files

Latest commit

History

Repository files navigation