Holistic Root Cause Analysis for Failures in Cloud-Native Systems Through Observability Data

Y Han, Q Du, Y Huang, P Li, X Shi, J Wu… - IEEE Transactions …, 2024 - ieeexplore.ieee.org
Microservices are widely adopted in large IT enterprises, leveraging the scalability,
resiliency, and elasticity of the cloud-native architecture. Effective root cause analysis is …

Landscape and Taxonomy of Online Parser-Supported Log Anomaly Detection Methods

S Lupton, H Washizaki, N Yoshioka… - IEEE Access, 2024 - ieeexplore.ieee.org
As production system estates become larger and more complex, ensuring stability through
traditional monitoring approaches becomes more challenging. Rule-based monitoring is …

Systematic Evaluation of Deep Learning Models for Log-based Failure Prediction

F Hadadi, JH Dawes, D Shin, D Bianculli… - Empirical Software …, 2024 - Springer
With the increasing complexity and scope of software systems, their dependability is crucial.
The analysis of log data recorded during system execution can enable engineers to …

First ce matters: On the importance of long term properties on memory failure prediction

J Bogatinovski, O Kao, Q Yu… - 2022 IEEE International …, 2022 - ieeexplore.ieee.org
Dynamic random access memory failures are a threat to the reliability of data centres as they
lead to data loss and system crashes. Timely predictions of memory failures allow for taking …

The Potential of One-Shot Failure Root Cause Analysis: Collaboration of the Large Language Model and Small Classifier

Y Han, Q Du, Y Huang, J Wu, F Tian, C He - Proceedings of the 39th …, 2024 - dl.acm.org
Failure root cause analysis (RCA), which systematically identifies underlying faults, is
essential for ensuring the reliability of widely adopted microservice-based applications and …

PreLog: A Pre-trained Model for Log Analytics

VH Le, H Zhang - Proceedings of the ACM on Management of Data, 2024 - dl.acm.org
Large-scale software-intensive systems often produce a large volume of logs to record
runtime status and events for troubleshooting purposes. The rich information in log data …

Systematic Evaluation of Deep Learning Models for Failure Prediction

F Hadadi, JH Dawes, D Shin, D Bianculli… - arXiv preprint arXiv …, 2023 - arxiv.org
With the increasing complexity and scope of software systems, their dependability is crucial.
The analysis of log data recorded during system execution can enable engineers to …

[图书][B] AI-enabled log analysis for improving IT system dependability

J Bogatinovski - 2023 - search.proquest.com
Modern IT systems play an indispensable role in industrial infrastructure and affect human
society, as billions of users and devices constantly compute, exchange and store data. Their …

[PDF][PDF] An Efficient Failure Predictive and Remediation System for Windows Infrastructure with Analysis of Log-Event Records

DA Bhanage, AV Pawar, A Joshi, RG Pawar - researchgate.net
The demand for IT infrastructures has grown due to their importance in business and
everyday life. Downtime due to the unavailability of any IT infrastructure components is …