Effective root cause analysis (RCA) is vital for swiftly restoring services, minimizing losses, and ensuring the smooth operation and management of complex systems. Previous data …
In recent years, microservices have gained widespread adoption in IT operations due to their scalability, maintenance, and flexibility. However, it becomes challenging for site …
Z Yao, C Pei, W Chen, H Wang, L Su, H Jiang… - … Proceedings of the …, 2024 - dl.acm.org
This paper presents Chain-of-Event (CoE), an interpretable model for root cause analysis in microservice systems that analyzes causal relationships of events transformed from multi …
R Ding, C Zhang, L Wang, Y Xu, M Ma, X Wu… - Proceedings of the 31st …, 2023 - dl.acm.org
Root Cause Analysis (RCA) is becoming increasingly crucial for ensuring the reliability of microservice systems. However, performing RCA on modern microservice systems can be …
L Wang, C Zhang, R Ding, Y Xu, Q Chen… - Proceedings of the 29th …, 2023 - dl.acm.org
In microservice systems, the identification of root causes of anomalies is imperative for service reliability and business impact. This process is typically divided into two phases:(i) …
Causal inference (CI) is one of the popular performance diagnosis methods, which infers the anomaly propagation from the observed data for locating the root causes. Although some …
Root cause analysis (RCA) in large-scale microservice systems is a critical and challenging task. To understand and localize root causes of unexpected faults, modern observability …
Most cloud applications use a large number of smaller sub-components (called microservices) that interact with each other in the form of a complex graph to provide the …
H Wang, Z Wu, H Jiang, Y Huang… - 2021 36th IEEE/ACM …, 2021 - ieeexplore.ieee.org
For large-scale distributed systems, it is crucial to efficiently diagnose the root causes of incidents to maintain high system availability. The recent development of microservice …