AI-SRE
SRE
監視・ログ
マルチエージェントシステム
ログ分析
LLM Assisted Anomaly Detection Service for Site Reliability Engineers: Enhancing Cloud Infrastructure Resilience
https://arxiv.org/pdf/2501.16744
AI-Assisted Incident Management in SRE: The Role of LLMs and Anomaly Detection
https://al-kindipublisher.com/index.php/jcsts/article/view/10054
“LLM for SRE“の世界探索
https://blog.yuuk.io/entry/2024/the-world-of-llm4sre
A Survey of AIOps for Failure Management in the Era of Large Language Models
https://arxiv.org/abs/2406.11213?utm_source=chatgpt.com
Awesome LLM AIOps
https://github.com/Jun-jie-Huang/awesome-LLM-AIOps
A Comprehensive Survey on Root Cause Analysis in (Micro) Services: Methodologies, Challenges, and Trends
https://arxiv.org/abs/2408.00803
Servicenow Incident Analysis
https://www.kaggle.com/discussions/questions-and-answers/189482
OpenRCA: Can Large Language Models Locate the Root Cause of Software Failures?
https://openreview.net/forum?id=M4qNIzQYpd
Argos: Agentic Time-Series Anomaly Detection with Autonomous Rule Generation via Large Language Models
https://arxiv.org/html/2501.14170v1?utm_source=chatgpt.com
Flow-of-Action: SOP Enhanced LLM-Based Multi-Agent System for Root Cause Analysis
https://arxiv.org/abs/2502.08224?utm_source=chatgpt.com