X-lifecycle Learning for Cloud Incident Management using LLMs
February 15, 2024 ยท Declared Dead ยท ๐ SIGSOFT FSE Companion
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Drishti Goel, Fiza Husain, Aditya Singh, Supriyo Ghosh, Anjaly Parayil, Chetan Bansal, Xuchao Zhang, Saravan Rajmohan
arXiv ID
2404.03662
Category
cs.NI: Networking & Internet
Cross-listed
cs.AI
Citations
27
Venue
SIGSOFT FSE Companion
Last Checked
3 months ago
Abstract
Incident management for large cloud services is a complex and tedious process and requires significant amount of manual efforts from on-call engineers (OCEs). OCEs typically leverage data from different stages of the software development lifecycle [SDLC] (e.g., codes, configuration, monitor data, service properties, service dependencies, trouble-shooting documents, etc.) to generate insights for detection, root causing and mitigating of incidents. Recent advancements in large language models [LLMs] (e.g., ChatGPT, GPT-4, Gemini) created opportunities to automatically generate contextual recommendations to the OCEs assisting them to quickly identify and mitigate critical issues. However, existing research typically takes a silo-ed view for solving a certain task in incident management by leveraging data from a single stage of SDLC. In this paper, we demonstrate that augmenting additional contextual data from different stages of SDLC improves the performance of two critically important and practically challenging tasks: (1) automatically generating root cause recommendations for dependency failure related incidents, and (2) identifying ontology of service monitors used for automatically detecting incidents. By leveraging 353 incident and 260 monitor dataset from Microsoft, we demonstrate that augmenting contextual information from different stages of the SDLC improves the performance over State-of-The-Art methods.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
๐ Similar Papers
In the same crypt โ Networking & Internet
R.I.P.
๐ป
Ghosted
R.I.P.
๐ป
Ghosted
Federated Learning in Mobile Edge Networks: A Comprehensive Survey
R.I.P.
๐ป
Ghosted
A Survey of Indoor Localization Systems and Technologies
R.I.P.
๐ป
Ghosted
Survey of Important Issues in UAV Communication Networks
R.I.P.
๐ป
Ghosted
Network Function Virtualization: State-of-the-art and Research Challenges
R.I.P.
๐ป
Ghosted
Applications of Deep Reinforcement Learning in Communications and Networking: A Survey
Died the same way โ ๐ป Ghosted
R.I.P.
๐ป
Ghosted
Language Models are Few-Shot Learners
R.I.P.
๐ป
Ghosted
PyTorch: An Imperative Style, High-Performance Deep Learning Library
R.I.P.
๐ป
Ghosted
XGBoost: A Scalable Tree Boosting System
R.I.P.
๐ป
Ghosted