MetricSifter: Feature Reduction of Multivariate Time Series Data for Efficient Fault Localization in Cloud Applications

Y Tsubouchi, H Tsuruta - IEEE Access, 2024 - ieeexplore.ieee.org
Automated fault localization in large-scale cloud-based applications is challenging because
it involves mining multivariate time series data from large volumes of operational monitoring …

Outage-Watch: Early Prediction of Outages using Extreme Event Regularizer

S Agarwal, S Chakraborty, S Garg, S Bisht… - Proceedings of the 31st …, 2023 - dl.acm.org
Cloud services are omnipresent and critical cloud service failure is a fact of life. In order to
retain customers and prevent revenue loss, it is important to provide high reliability …

Disentangled Causal Graph Learning for Online Unsupervised Root Cause Analysis

D Wang, Z Chen, Y Fu, Y Liu, H Chen - arXiv preprint arXiv:2305.10638, 2023 - arxiv.org
The task of root cause analysis (RCA) is to identify the root causes of system faults/failures
by analyzing system monitoring data. Efficient RCA can greatly accelerate system failure …

Tadl: Fault localization with transformer-based anomaly detection for dynamic microservice systems

Y Li, Y Lu, J Wang, Q Qi, J Wang… - … on Software Analysis …, 2023 - ieeexplore.ieee.org
Due to the complexity of microservice architecture, it is difficult to accomplish efficient
microservice anomaly detection and localization tasks and achieve the target of high system …

The PetShop Dataset--Finding Causes of Performance Issues across Microservices

M Hardt, W Orchard, P Blöbaum… - arXiv preprint arXiv …, 2023 - arxiv.org
Identifying root causes for unexpected or undesirable behavior in complex systems is a
prevalent challenge. This issue becomes especially crucial in modern cloud applications …

Industrial-Grade Smart Troubleshooting through Causal Technical Language Processing: a Proof of Concept

A Trilla, O Yiboe, N Mijatovic, J Vitrià - arXiv preprint arXiv:2407.20700, 2024 - arxiv.org
This paper describes the development of a causal diagnosis approach for troubleshooting
an industrial environment on the basis of the technical language expressed in Return on …

Root Cause Analysis of Outliers with Missing Structural Knowledge

N Okati, SHG Mejia, WR Orchard, P Blöbaum… - arXiv preprint arXiv …, 2024 - arxiv.org
Recent work conceptualized root cause analysis (RCA) of anomalies via quantitative
contribution analysis using causal counterfactuals in structural causal models (SCMs). The …

Hi-RCA: A Hierarchy Anomaly Diagnosis Framework Based on Causality and Correlation Analysis

J Yang, Y Guo, Y Chen, Y Zhao - Applied Sciences, 2023 - mdpi.com
Microservice architecture has been widely adopted by large-scale applications. Due to the
huge amount of data and complex microservice dependency, it also poses new challenges …

Cloud Atlas: Efficient Fault Localization for Cloud Systems using Language Models and Causal Insight

Z Xie, Y Zheng, L Ottens, K Zhang, C Kozyrakis… - arXiv preprint arXiv …, 2024 - arxiv.org
Runtime failure and performance degradation is commonplace in modern cloud systems.
For cloud providers, automatically determining the root cause of incidents is paramount to …

Fault Location Method Based on Dynamic Operation and Maintenance Map and Common Alarm Points Analysis

S Wu, J Guan - Algorithms, 2024 - mdpi.com
Under a distributed information system, the scale of various operational components such as
applications, operating systems, databases, servers, and networks is immense, with intricate …