Incremental causal graph learning for online root cause analysis

D Wang, Z Chen, Y Fu, Y Liu, H Chen - Proceedings of the 29th ACM …, 2023 - dl.acm.org
The task of root cause analysis (RCA) is to identify the root causes of system faults/failures
by analyzing system monitoring data. Efficient RCA can greatly accelerate system failure …

Root cause analysis for microservice systems via hierarchical reinforcement learning from human feedback

L Wang, C Zhang, R Ding, Y Xu, Q Chen… - Proceedings of the 29th …, 2023 - dl.acm.org
In microservice systems, the identification of root causes of anomalies is imperative for
service reliability and business impact. This process is typically divided into two phases:(i) …

Multi-task federated learning-based system anomaly detection and multi-classification for microservices architecture

J Hao, P Chen, J Chen, X Li - Future Generation Computer Systems, 2024 - Elsevier
The microservices architecture is extensively utilized in cloud-based application
development, characterized by the construction of applications through a series of …

Learning DAGs from data with few root causes

P Misiakos, C Wendler… - Advances in Neural …, 2024 - proceedings.neurips.cc
We present a novel perspective and algorithm for learning directed acyclic graphs (DAGs)
from data generated by a linear structural equation model (SEM). First, we show that a linear …

Active causal structure learning with advice

D Choo, T Gouleakis… - … Conference on Machine …, 2023 - proceedings.mlr.press
We introduce the problem of active causal structure learning with advice. In the typical well-
studied setting, the learning algorithm is given the essential graph for the observational …

[HTML][HTML] A fine-grained robust performance diagnosis framework for run-time cloud applications

R Xin, P Chen, P Grosso, Z Zhao - Future Generation Computer Systems, 2024 - Elsevier
To maintain the required service quality of time-critical cloud applications, operators must
continuously monitor their runtime status, detect potential performance anomalies, and …

Root Cause Analysis in Microservice Using Neural Granger Causal Discovery

CM Lin, C Chang, WY Wang, KD Wang… - Proceedings of the AAAI …, 2024 - ojs.aaai.org
In recent years, microservices have gained widespread adoption in IT operations due to
their scalability, maintenance, and flexibility. However, it becomes challenging for site …

MULAN: Multi-modal Causal Structure Learning and Root Cause Analysis for Microservice Systems

L Zheng, Z Chen, J He, H Chen - Proceedings of the ACM on Web …, 2024 - dl.acm.org
Effective root cause analysis (RCA) is vital for swiftly restoring services, minimizing losses,
and ensuring the smooth operation and management of complex systems. Previous data …

Multi-modal Causal Structure Learning and Root Cause Analysis

L Zheng, Z Chen, J He, H Chen - arXiv preprint arXiv:2402.02357, 2024 - arxiv.org
Effective root cause analysis (RCA) is vital for swiftly restoring services, minimizing losses,
and ensuring the smooth operation and management of complex systems. Previous data …

iSCAN: identifying causal mechanism shifts among nonlinear additive noise models

T Chen, K Bello, B Aragam… - Advances in Neural …, 2024 - proceedings.neurips.cc
Structural causal models (SCMs) are widely used in various disciplines to represent causal
relationships among variables in complex systems. Unfortunately, the underlying causal …