Intelligent Monitoring Framework for Cloud Services: A Data-Driven Approach
February 29, 2024 Β· Declared Dead Β· π 2024 IEEE/ACM 46th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP)
"No code URL or promise found in abstract"
Evidence collected by the PWNC Scanner
Authors
Pooja Srinivas, Fiza Husain, Anjaly Parayil, Ayush Choure, Chetan Bansal, Saravan Rajmohan
arXiv ID
2403.07927
Category
cs.NI: Networking & Internet
Cross-listed
cs.LG
Citations
10
Venue
2024 IEEE/ACM 46th International Conference on Software Engineering: Software Engineering in Practice (ICSE-SEIP)
Last Checked
3 months ago
Abstract
Cloud service owners need to continuously monitor their services to ensure high availability and reliability. Gaps in monitoring can lead to delay in incident detection and significant negative customer impact. Current process of monitor creation is ad-hoc and reactive in nature. Developers create monitors using their tribal knowledge and, primarily, a trial and error based process. As a result, monitors often have incomplete coverage which leads to production issues, or, redundancy which results in noise and wasted effort. In this work, we address this issue by proposing an intelligent monitoring framework that recommends monitors for cloud services based on their service properties. We start by mining the attributes of 30,000+ monitors from 791 production services at Microsoft and derive a structured ontology for monitors. We focus on two crucial dimensions: what to monitor (resources) and which metrics to monitor. We conduct an extensive empirical study and derive key insights on the major classes of monitors employed by cloud services at Microsoft, their associated dimensions, and the interrelationship between service properties and this ontology. Using these insights, we propose a deep learning based framework that recommends monitors based on the service properties. Finally, we conduct a user study with engineers from Microsoft which demonstrates the usefulness of the proposed framework. The proposed framework along with the ontology driven projections, succeeded in creating production quality recommendations for majority of resource classes. This was also validated by the users from the study who rated the framework's usefulness as 4.27 out of 5.
Community Contributions
Found the code? Know the venue? Think something is wrong? Let us know!
π Similar Papers
In the same crypt β Networking & Internet
R.I.P.
π»
Ghosted
π
π
The Cartographer
Federated Learning in Mobile Edge Networks: A Comprehensive Survey
π
π
The Cartographer
A Survey of Indoor Localization Systems and Technologies
R.I.P.
π»
Ghosted
Survey of Important Issues in UAV Communication Networks
π
π
The Cartographer
Network Function Virtualization: State-of-the-art and Research Challenges
π
π
The Cartographer
Applications of Deep Reinforcement Learning in Communications and Networking: A Survey
Died the same way β π» Ghosted
R.I.P.
π»
Ghosted
Federated Learning: Strategies for Improving Communication Efficiency
R.I.P.
π»
Ghosted
In-Datacenter Performance Analysis of a Tensor Processing Unit
R.I.P.
π»
Ghosted
Deep Convolutional Neural Networks for Computer-Aided Detection: CNN Architectures, Dataset Characteristics and Transfer Learning
R.I.P.
π»
Ghosted