Anomaly detection and failure root cause analysis in (micro) service-based cloud applications: A survey

J Soldani, A Brogi - ACM Computing Surveys (CSUR), 2022 - dl.acm.org
The proliferation of services and service interactions within microservices and cloud-native
applications, makes it harder to detect failures and to identify their possible root causes …

Microservice security: a systematic literature review

D Berardi, S Giallorenzo, J Mauro, A Melis… - PeerJ Computer …, 2022 - peerj.com
Microservices is an emerging paradigm for developing distributed systems. With their
widespread adoption, more and more work investigated the relation between microservices …

Root cause analysis of failures in microservices through causal discovery

A Ikram, S Chakraborty, S Mitra… - Advances in …, 2022 - proceedings.neurips.cc
Most cloud applications use a large number of smaller sub-components (called
microservices) that interact with each other in the form of a complex graph to provide the …

Eadro: An end-to-end troubleshooting framework for microservices on multi-source data

C Lee, T Yang, Z Chen, Y Su… - 2023 IEEE/ACM 45th …, 2023 - ieeexplore.ieee.org
The complexity and dynamism of microservices pose significant challenges to system
reliability, and thereby, automated troubleshooting is crucial. Effective root cause localization …

Microrank: End-to-end latency issue localization with extended spectrum analysis in microservice environments

G Yu, P Chen, H Chen, Z Guan, Z Huang… - Proceedings of the Web …, 2021 - dl.acm.org
With the advantages of flexible scalability and fast delivery, microservice has become a
popular software architecture in the modern IT industry. However, the explosion in the …

Practical root cause localization for microservice systems via trace analysis

Z Li, J Chen, R Jiao, N Zhao, Z Wang… - 2021 IEEE/ACM 29th …, 2021 - ieeexplore.ieee.org
Microservice architecture is applied by an increasing number of systems because of its
benefits on delivery, scalability, and autonomy. It is essential but challenging to localize root …

Causal inference-based root cause analysis for online service systems with intervention recognition

M Li, Z Li, K Yin, X Nie, W Zhang, K Sui… - Proceedings of the 28th …, 2022 - dl.acm.org
Fault diagnosis is critical in many domains, as faults may lead to safety threats or economic
losses. In the field of online service systems, operators rely on enormous monitoring data to …

Actionable and interpretable fault localization for recurring failures in online service systems

Z Li, N Zhao, M Li, X Lu, L Wang, D Chang… - Proceedings of the 30th …, 2022 - dl.acm.org
Fault localization is challenging in an online service system due to its monitoring data's large
volume and variety and complex dependencies across/within its components (eg, services …

Automatic root cause analysis via large language models for cloud incidents

Y Chen, H Xie, M Ma, Y Kang, X Gao, L Shi… - Proceedings of the …, 2024 - dl.acm.org
Ensuring the reliability and availability of cloud services necessitates efficient root cause
analysis (RCA) for cloud incidents. Traditional RCA methods, which rely on manual …

[HTML][HTML] Causalrca: Causal inference based precise fine-grained root cause localization for microservice applications

R Xin, P Chen, Z Zhao - Journal of Systems and Software, 2023 - Elsevier
Effectively localizing root causes of performance anomalies is crucial to enabling the rapid
recovery and loss mitigation of microservice applications in the cloud. Depending on the …