FRL-MFPG: Propagation-aware fault root cause location for microservice intelligent operation and maintenance

Y Chen, D Xu, N Chen, X Wu - Information and Software Technology, 2023 - Elsevier
Context: Due to the continuous updates and complex dependencies of microservices, the
probability of a fault occurrence and the difficulty of doing a diagnosis have increased …

Causil: Causal graph for instance level microservice data

S Chakraborty, S Garg, S Agarwal, A Chauhan… - Proceedings of the …, 2023 - dl.acm.org
AI-based monitoring has become crucial for cloud-based services due to its scale. A
common approach to AI-based monitoring is to detect causal relationships among service …

A survey on AI for storage

Y Liu, H Wang, K Zhou, CH Li, R Wu - CCF Transactions on High …, 2022 - Springer
Storage, as a core function and fundamental component of computers, provides services for
saving and reading digital data. The increasing complexity of data operations and storage …

ESRO: Experience Assisted Service Reliability against Outages

S Chakraborty, S Agarwal, S Garg… - 2023 38th IEEE/ACM …, 2023 - ieeexplore.ieee.org
Modern cloud services are prone to failures due to their complex architecture, making
diagnosis a critical process. Site Reliability Engineers (SREs) spend hours leveraging …

Failure Diagnosis in Microservice Systems: A Comprehensive Survey and Analysis

S Zhang, S Xia, W Fan, B Shi, X Xiong, Z Zhong… - arXiv preprint arXiv …, 2024 - arxiv.org
Modern microservice systems have gained widespread adoption due to their high
scalability, flexibility, and extensibility. However, the characteristics of independent …

A causal approach to detecting multivariate time-series anomalies and root causes

W Yang, K Zhang, SCH Hoi - arXiv preprint arXiv:2206.15033, 2022 - arxiv.org
Detecting anomalies and the corresponding root causes in multivariate time series plays an
important role in monitoring the behaviors of various real-world systems, eg, IT system …

[HTML][HTML] Evolutionary game analysis on cloud providers and enterprises' strategies for migrating to cloud-native under digital transformation

R Zhang, Y Li, H Li, Q Wang - Electronics, 2022 - mdpi.com
Cloud-native is an innovative technology and methodology that is necessary to realize the
digital transformation of enterprises. Promoting the wide adoption of cloud-native in cloud …

A systematic graph-based methodology for cognitive predictive maintenance of complex engineering equipment

L Xia - 2024 - theses.lib.polyu.edu.hk
Maintenance is a vital aspect of ensuring the reliability, availability, and safety of machinery
and systems. Traditional maintenance approaches, such as corrective and preventive …

KGroot: A knowledge graph-enhanced method for root cause analysis

T Wang, G Qi, T Wu - Expert Systems with Applications, 2024 - Elsevier
Fault localization in online microservices is a challenging task due to the vast amount of
monitoring data, diversity of types and events, and complex interdependencies among …

Explainable cyber-physical energy systems based on knowledge graph

PR Aryan, FJ Ekaputra, M Sabou, D Hauer… - Proceedings of the 9th …, 2021 - dl.acm.org
Explainability can help cyber-physical systems alleviating risk in automating decisions that
are affecting our life. Building an explainable cyber-physical system requires deriving …