Skip to content

A list of awesome academic researches and industrial materials about Large Language Model (LLM) and Artificial Intelligence for IT Operations (AIOps).

License

Notifications You must be signed in to change notification settings

Jun-jie-Huang/awesome-LLM-AIOps

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 

Repository files navigation

Awesome LLM AIOps

A list of awesome academic researches and industrial materials about Large Language Model (LLM) and Artificial Intelligence for IT Operations (AIOps).

Content

Introduction

This is a list of awesome academic researches and industrial materials about Large Language Model (LLM) and Artificial Intelligence for IT Operations (AIOps).

Keywords Convention

The abbreviation of the work.

The utilized LLM techniques used in the work.

The mainly explored task of the work.

Other important information of the work.

1. LLM for Incident Management

1.0 Survey

  1. [Preprint 2024] A Survey of AIOps for Failure Management in the Era of Large Language Models.
  2. [Preprint 2024] AI Assistants for Incident Lifecycle in a Microservice Environment: A Systematic Literature Review.

1.1 Incident Diagnosis

  1. [SoCC 2024] Building AI Agents for Autonomous Clouds: Challenges and Design Principles.
  2. [VLDB 2024] D-Bot: Database Diagnosis System using Large Language Models [project].
  3. [HotNets 2023] A Holistic View of AI-driven Network Incident Management.
  4. [Preprint 2024] FLASH: A Workflow Automation Agent for Diagnosing Recurring Incidents.
  5. [Preprint 2024] AIOpsLab: A Holistic Framework for Evaluating AI Agents for Enabling Autonomous Cloud.

1.2 Incident Reporting

  1. [ICSE-SEIP 2024] Knowledge-aware Alert Aggregation in Large-scale Cloud Systems: a Hybrid Approach.
  2. [ISSRE 2024] Large Language Models Can Provide Accurate and Interpretable Incident Triage.
  3. [FSE Industry 2024] MonitorAssistant: Simplifying Cloud Service Monitoring via Large Language Models.
  4. [ESEC/FSE Industry 2023] Assess and Summarize: Improve Outage Understanding with Large Language Models.
  5. [Preprint 2024] Large Language Models can Deliver Accurate and Interpretable Time Series Anomaly Detection.

1.3 Root Cause Analysis

  1. [ASE 2024] The Potential of One-Shot Failure Root Cause Analysis: Collaboration of the Large Language Model and Small Classifier.
  2. [FSE Industry 2024] LM-PACE: Confidence Estimation by Large Language Models for Effective Root Causing of Cloud Incidents.
  3. [FSE Industry 2024] Automated Root Causing of Cloud Incidents using In-Context Learning with GPT-4.
  4. [FSE Industry 2024] X-lifecycle Learning for Cloud Incident Management using LLMs.
  5. [FSE Industry 2024] Exploring LLM-based Agents for Root Cause Analysis.
  6. [ICSE 2024] Xpert: Empowering Incident Management with Query Recommendations via Large Language Models.
  7. [ICSE 2023] Recommending Root-Cause and Mitigation Steps for Cloud Incidents using Large Language Models.
  8. [Preprint 2023] RCAgent: Cloud Root Cause Analysis by Autonomous Agents with Tool-Augmented Large Language Models.
  9. [EMNLP 2024 (Findings)] mABC: Multi-Agent Blockchain-inspired Collaboration for Root Cause Analysis in Micro-Services Architecture.
  10. [Preprint 2024] Cloud Atlas: Efficient Fault Localization for Cloud Systems using Language Models and Causal Insight.

1.4 Incident Mitigation

  1. [SIGOPS 2024] LLexus: an AI agent system for incident management.
  2. [FSE Industry 2024] Leveraging Large Language Models for the Auto-remediation of Microservice Applications - An Experimental Study.
  3. [ECAI 2024] Nissist: An Incident Mitigation Copilot based on Troubleshooting Guides.
  4. [Preprint 2024] Retrieval Augmented Generation-Based Incident Resolution Recommendation System for IT Support.

1.5 Incident Postmortem Analysis

  1. [ICSE-SEIP 2024] FaultProfIT: Hierarchical Fault Profiling of Incident Tickets in Large-scale Cloud Systems.
  2. [ASE 2024] FAIL: Analyzing Software Failures from the News Using LLMs.

1.6 AIOps Question Answering

  1. [EMNLP Industry 2023] Empower Large Language Model to Perform Better on Industrial Domain-Specific Question Answering [project].
  2. [ICLR 2024] OWL: A Large Language Model for IT Operations [project].
  3. [SANER 2024] Gloss: Guiding Large Language Models to Answer Questions from System Logs.
  4. [Preprint 2023] An Empirical Study of NetOps Capability of Pre-Trained Large Language Models [project].
  5. [Preprint 2023] OpsEval: A Comprehensive Task-Oriented AIOps Benchmark for Large Language Models [project].

2. LLM for Log Analysis

2.1 Log Parsing

  1. [ISSTA 2024] A Large-Scale Evaluation for Log Parsing Techniques: How Far Are We?.
  2. [ASE-NIER 2023] Log Parsing: How Far Can ChatGPT Go? [project].
  3. [ICSE 2024] DivLog: Log Parsing with Prompt Enhanced In-Context Learning.
  4. [ICSE 2024] LLMParser: An Exploratory Study on Using Large Language Models for Log Parsing.
  5. [FSE 2024] LILAC: Log Parsing using LLMs with Adaptive Parsing Cache.
  6. [ICPC 2024] Interpretable Online Log Analysis Using Large Language Models with Prompt Strategies.
  7. [Preprint 2024] LEMUR : Log Parsing with Entropy Sampling and Chain-of-Thought Merging.
  8. [Preprint 2024] Stronger, Cheaper and Demonstration-Free Log Parsing with LLMs.
  9. [Preprint 2024] LUNAR: Unsupervised LLM-based Log Parsing.
  10. [Preprint 2024] Log Parsing with Self-Generated In-Context Learning and Self-Correction.
  11. [Preprint 2024] OpenLogParser: Unsupervised Parsing with Open-Source Large Language Models.

2.2 Log Anomaly Detection

  1. [ICPC 2024] Interpretable Online Log Analysis Using Large Language Models with Prompt Strategies.
  2. [Preprint 2023] Log-based Anomaly Detection based on EVT Theory with feedback.
  3. [Preprint 2023] LogGPT: Exploring ChatGPT for Log-Based Anomaly Detection.
  4. [Preprint 2024] RAGLog: Log Anomaly Detection using Retrieval Augmented Generation.
  5. [Preprint 2024] Anomaly Detection on Unstable Logs with GPT Models.

2.3 Logging Statement Generation

  1. [ICSE 2024] UniLog: Automatic Logging via LLM and In-Context Learning.
  2. [FSE 2024] Go Static: Contextualized Logging Statement Generation.
  3. [TSE 2024] Exploring the Effectiveness of LLMs in Automated Logging Generation: An Empirical Study [project].

Contribution

Contributing to this paper list

  • First, think about which category the work should belong to.
  • Second, use the same format as the others to discribe the work. Note that there should be an empty line between the title and the authors list, and take care of the indentation.
  • Then, add keywords tags. Add the pdf link of the paper. If it is an arxiv publication, we prefer /abs/ format to /pdf/ format.

Don't worry if you put all these wrong, we will fix them for you. Just contribute and promote your awesome work here!

If you recommended a work that wasn't yours, you will be added to the contributor list (be sure to provide your information in other contributors).

About

A list of awesome academic researches and industrial materials about Large Language Model (LLM) and Artificial Intelligence for IT Operations (AIOps).

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published