Automatic software repair: A survey

L Gazzola, D Micucci, L Mariani - … of the 40th International Conference on …, 2018 - dl.acm.org
Debugging software failures is still a painful, time consuming, and expensive process. For
instance, recent studies showed that debugging activities often account for about 50% of the …

Experience report: System log analysis for anomaly detection

S He, J Zhu, P He, MR Lyu - 2016 IEEE 27th international …, 2016 - ieeexplore.ieee.org
Anomaly detection plays an important role in management of modern large-scale distributed
systems. Logs, which record system runtime information, are widely used for anomaly …

Log clustering based problem identification for online service systems

Q Lin, H Zhang, JG Lou, Y Zhang, X Chen - Proceedings of the 38th …, 2016 - dl.acm.org
Logs play an important role in the maintenance of large-scale online service systems. When
an online service fails, engineers need to examine recorded logs to gain insights into the …

Identifying impactful service system problems via log analysis

S He, Q Lin, JG Lou, H Zhang, MR Lyu… - Proceedings of the 2018 …, 2018 - dl.acm.org
Logs are often used for troubleshooting in large-scale software systems. For a cloud-based
online system that provides 24/7 service, a huge number of logs could be generated every …

Where do developers log? an empirical study on logging practices in industry

Q Fu, J Zhu, W Hu, JG Lou, R Ding, Q Lin… - … Proceedings of the …, 2014 - dl.acm.org
System logs are widely used in various tasks of software system management. It is crucial to
avoid logging too little or too much. To achieve so, developers need to make informed …

Aiops solutions for incident management: Technical guidelines and a comprehensive literature review

Y Remil, A Bendimerad, R Mathonat… - arXiv preprint arXiv …, 2024 - arxiv.org
The management of modern IT systems poses unique challenges, necessitating scalability,
reliability, and efficiency in handling extensive data streams. Traditional methods, reliant on …

Predicting node failures in an ultra-large-scale cloud computing platform: an aiops solution

Y Li, ZM Jiang, H Li, AE Hassan, C He… - ACM Transactions on …, 2020 - dl.acm.org
Many software services today are hosted on cloud computing platforms, such as Amazon
EC2, due to many benefits like reduced operational costs. However, node failures in these …

Software analytics in practice

D Zhang, S Han, Y Dang, JG Lou, H Zhang… - IEEE …, 2013 - ieeexplore.ieee.org
With software analytics, software practitioners explore and analyze data to obtain insightful,
actionable information for tasks regarding software development, systems, and users. The …

An empirical study of the impact of data splitting decisions on the performance of AIOps solutions

Y Lyu, H Li, M Sayagh, ZM Jiang… - ACM Transactions on …, 2021 - dl.acm.org
AIOps (Artificial Intelligence for IT Operations) leverages machine learning models to help
practitioners handle the massive data produced during the operations of large-scale …

On the model update strategies for supervised learning in aiops solutions

Y Lyu, H Li, ZM Jiang, AE Hassan - ACM Transactions on Software …, 2024 - dl.acm.org
AIOps (Artificial Intelligence for IT Operations) solutions leverage the massive data produced
during the operation of large-scale systems and machine learning models to assist software …